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EXPERIMENTAL DESIGN IN INDUSTRY* 


H. C. HAMAKER 
Philips Research Laboratories 
N.V. Philips’ Gloeilampenfabrieken 
Eindhoven, Netherlands 


1. Introduction 


The design of experiment and analysis of variance are techniques 
which have been developed mainly in connection with agricultural 
research. The majority of examples in textbooks on the subject are 
consequently drawn from this field. 

These techniques also apply to industrial and technological experi- 
mentation. The main purpose of this note is to emphasize that the 
conditions under which these techniques have to be applied in industry 
are in several respects essentially different from those prevailing in 
agriculture; and that these differences imply changes in the method of 
teaching, of analysis, and of presentation if we are to reap the full 
benefit and efficiency of these statistical methods in the industrial 
field. 


2. The difference in speed and its consequences 


One of the main differences is a difference in speed. The agricultural- 
ist is usually restricted to one experiment in a year. Hence he has 
ample opportunity to plan his experiment, to carry it out, and to 
analyze it before the planning of the next experiment is started. If we 
are limited to one experiment per season it is evidently of great value 
to set up a fairly complex experiment so that the maximum of infor- 
mation can be collected in a reasonable time. It will likewise pay to 
have the entire procedure supervised by a staff of competent scientists 
with a university background. 

Not so in industry. A single machine produces 1200 light bulbs 
or radio valves in one hour; and even a slower product such as the 
wheel of a railway carriage is shaped within about 10 minutes. Also, 
where mass production is going on on a vast scale and is consequently 


*Contribution to the third International Biometric Conference held in Bellagio, 1-5 September 
1953. 
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organized as a routine, its supervision is not as a rule in the hands of 
scientists but is entrusted to personnel with a secondary-school edu- 
cation, maybe with some additional technical training. 

Carrying out a designed experiment under those conditions requires 
a considerable effort in organization, and production managers will 
only be inclined to undertake such methods of experimentation if they 
are convinced of their utility and if they are able to comprehend the 
useful results achieved. 

In the first place this requires a not too complex set-up of the 
experiments; two- or three-way classifications and latin squares are 
already an important improvement as compared to the one-way classi- 
fication of classical laboratory experiments. Owing to the high speed 
such simple experiments can easily be repeated with slight alterations; 
and more complex experiments, which are difficult to explain and may 
aim at too much information at once, can often be avoided. 

Another and very important requirement is that we must explain 
our purposes in a language that factory people will understand, that is, 
by numerical example. We have all learned from early childhood to 
reason in figures, rather than in symbols and abstract formulae. Pro- 
duction managers in particular, whether they are statistically minded 
or not, look daily at figures representing their stock of raw materials, 
the amount and quality of items produced, the scrap discarded, etc. 
They will consequently grasp the meaning of a designed experiment 
much sooner and easier from a numerical example than from a symbolic 
model, provided the numerical data are presented in a form which 
corresponds as closely as possible to the producer’s technological 
experience. 

We will illustrate this by some numerical examples which all refer 
to the simplest type of designed experiment, viz. the two-way classi- 
fication. 


3. Some industrial examples of two-way classifications 


Five nickel rods of 1 mm diameter are put in a metallic clamp, 
jointly immersed in a suspension of aluminum oxide (fig. 1), and for a 
few seconds a tension of some 100 volts is applied between the nickel 
rods and the electrically conducting vessel containing the suspension, 
the rods being negative. Since the oxide particles carry a positive 
charge owing to their colloid properties, they move towards the nickel 
electrodes and are deposited there as an oxide coating. To investigate 
how the amount deposited varies with position and height, the thickness 
of the coating was observed in three heights (H, — H;) on each of the 5 
rods (positions P, — P;), the results being as recorded in table 1. 
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TABLE 1 


rods 


Position of Ni rod 


P, P, P; P, P; 
Z;; in microns Zi. 
i, 125 130 128 134 143 132 
Height H;, 126 150 127 124 118 129 
H; 130 155 168 159 138 150 
Z.; 127 145 141 139 133 137 
TABLE 2 


Analysis of variance of the data of table 1 


Source 8.8. D.F. MSS. 
Positions P 600 4 150 p? 
Heights H 1290 . 2 645 
Residual 1168 8 146 


If we analyze these data in the usual way (table 2) we find sig- 
nificant differences between heights but no significant differences 
between positions. The trouble: is, however, that if we present the 
analysis in this form to people in the factory, where the technique is 
used for coating the heating filaments of radio-valve cathodes, it will 
not be understood. Reading and interpreting sums of squares, degrees 
of freedom, and mean squares requires a sophisticated statistical train- 
ing which we do not encounter in an industrial environment. Besides, 
if we accumulate a sufficient number of observations even the smallest 
differences will eventually become statistically significant, but this 
statistical significance tells us nothing whatsoever as to their tech- 
nological significance. An effect that is statistically significant may 
be technologically quite negligible; and conversely statistical insignifi- 
cance does not disprove the presence of effects that may be tech- 
nologically worth consideration. 

Hence for industrial purposes we must seek 2 method of presentation 
which is more easily understood. We have found the method explained 
in table 3 very useful. 
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FIG. 1. FIVE NICKEL RODS SIMULTANEOUSLY COATED WITH ALUMINUM OXIDE 
LAYER BY ELECTROPHORESIS. 


TABLE 3 
A simple and useful presentation of the analysis of the data in table 1 


Average Positions Heights Residual 
P, —10n 
P, + 8 A, — 5p 
137 » Ps + 4 H, — 8 
+32 +13 8 = 
P; -— 4 y= 8 
8 11.3 = 2 
3’ 12/+/3 = 6.93» =8 | 12/5 =8 


The general average is 1374 and we may express the influence of 
height and position by computing the difference between row and 


column averages from the general average. 


These data are given in 


table 3; from them we see that the average layer thickness in position 
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Lis 1Ou below the general average and for height 3 it is 13m above. 
These are figures from whieh the man in the factory can judge the 
technological importance of the effects observed. 

From these data we may now proceed to predict that the average 
thickness in position 2 and height 3 will be 


137 +8 + 13 = 158p, (1) 


and by carrying out the same computation for all positions and heights 
we find that we can explain the part of the observations recorded in 
table 4; and by subtracting from the original data we find what portion 
is still left unexplained. 


TABLE 4 
The parts of the data of table 1 that are explained and unexplained by the simple 
additive formula (1). 


Part explained Unexplained 
122 140 136 134 128 +3 -—10 —8 0 ! +15 
119° 137) | -7 
140 158 154 152 146 | —10 +7 -8 


Clearly equation (1) assumes that positions and heights act in- 
dependently so that their effects may simply be added. The unexplained 
part of the observations may partly be due to this assumption being 
too simplistic, partly they may result from random fluctuations which 
can never be avoided. If we interpret the unexplained part as entirely 
due to random errors we can proceed to estimate the standard deviation 
by dividing the sum of squares by the number of degrees of freedom; 
since the sums of the elements of the unexplained part of the obser- 
vations are zero both in a horizontal and in a vertical direction it is 
easily seen that all elements are determined when the eight elements 
within the dotted frame are fixed; hence the number of degrees of 
freedom is 8. 

From the estimate of error, s = 12u, vy = 8, thus obtained we may 
predict a standard deviation between positions of 12/VW3 = 6.9u, 
since the average for each position is based on 3 observations. The 
actual standard deviation computed from the position averages is 
7.1u, » = 4 and this is clearly not significant. The same argument ap- 
plied to heights yields a predicted standard deviation of 12/ V5 = 5.Au 
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TABLE 5 
Fluidity of iron as a function of Si-content for three replicates 


1.25 1.50 1.75 2.00 2.25 % Si 


1 47.5 60.0 65.0 72.5 77.5 
Replication 2 55.0 55.0 67.5 75.0 85.0 
3 37.5 50.0 70.0 75.0 75.0 


TABLE 6 

Various methods of applying analysis of variance to the data of table 5 

Source S.S. DF. MSS. 
I Treatments 2177 4 

Residual 275 10 28 
Source 8.S. DF. MS. 
II Linear term 2125 1 2125 
Residual 326 13 25 

Source 8.8. D.F M.S: 
Ill Linear term L 2125 1 2125 
Quadratic term Q 33 1 33 
Replications R 90 2 45 
RXL 40 2 20 
RXQ 108 2 54 
Residual 55 6 9 


against a computed value of 11.34, »y = 2, and the appropriate statistical 
test shows that this difference is significant at about 5% level. 

Of course the entire argument is essentially that of the analysis of 
variance, but it presents the results in a form that will be much more 
easily understood. From the differences between positions and between 
heights as given in table 3 the industrial technician will at once realize 
the technological importance of the effects observed. Also standard 
deviations, expressed in a dimension with which factory operators are 
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familiar and directly associated with the 20 and 3o limit concepts, are 
more easily understood. Variances must be considered as part of a 
statistical jargon which should preferably be avoided. 

As stated before we have found this method of presentation most 
useful, because by it people will easily grasp the meaning of the statisti- 
cal analysis and will be induced to apply it to good purpose to their own 
observations. Pretty soon, however, they will come across cases where 
a simple analysis by rows and columns does not give the full answer. 
The observations recorded in table 5 from a paper by A. Palazzi' are a 
case in point. 
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FIG. 2. FLUIDITY OF IRON PLOTTED AGAINST SILICON CONTENT; 
THREE REPLICATES. 
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The fluidity of iron was observed at 5 different levels of Si-content, 
in three replications (table 5). If we plot these observations (fig. 2) 
we observe a linear dependence of the fluidity on Si-content and an 
analysis in treatments and error is clearly unsatisfactory because this 
fact is not taken into account. Indeed if we introduce a linear com- 
ponent we find that nearly the full sum of squares which is in the first 
analysis attributed to 4 degrees of freedom for treatments is concen- 
trated into one degree of freedom for regression in the second case; 
thereby the degree of significance is enormously enhanced (see table 6, 
I and II). 

If we are so inclined the residual component can be split up into a 
number of parts testing separately differences between replications, 
between regression coefficients for the three replications, etc.; but this 
does not reveal any pronounced effects, as might have been surmised 
from the plot (table 6, III). 

Finally, for presentation to the factory, sums of squares and mean 
squares are again unsuitable; the actual equation giving the linear 
relationship between the fluidity and Si-content plus one estimate of 
error is technologically much more convenient. We then arrive at the 
result: 


F = 64.5 + 33.8(Si% — 1.75) + e;| 
s(e) = 5.0; y = 13; 
8(64.5) = 1.3; 
3(33.8) = 3.7 


(2) 


where s(64.5) and s(33.8) represent the standard errors in the constants 
64.5 and 33.8 of the regression equation as estimated from the residual 
error s(e) = 5.0. 


TABLE 7 
Self-inductance of coils with iron-oxide cores at varying temperatures of the bridge; 
the recorded data are % deviations from a standard. 


Temp. Coil 
°C 1 2 3 4 5 
21 1.400 0.246 0.478 1.010 0.629 % 
23 1.400 0.235 0.467 0.990 0.620 
24 1.375 0.212 0.444 0.968 0.495 
25 1.370 0.208 0.440 0.967 0.495 
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A third somewhat more complex case is shown in table 7. Here 
the selfinductance of some coils with an iron-oxide core were measured 
while the temperature of the measuring bridge was varied, the coil 
temperature being kept constant; % deviations from a standard were 
observed. 

When we plot the data (fig. 3) we see that with coil 5 some reading, 
or clerical, errors have occurred. This is a characteristic of industrial 
conditions. Clerical errcrs may always be expected and it is a useful, 
if not a necessary, practice to plot the data before a numerical analysis 
is started. In a treatment by purely numerical techniques clerical 
errors may easily pass unnoticed and cause serious distortions in the 
conclusions drawn. 

Discarding the observations on coil 5 as evidently unreliable, we see 
for the remaining 4 coils a linear temperature dependence and deviations 
from a straight regression line which seem to possess the same sign and 
magnitude for one temperature independently of the coil measured. It 
might of course be possible to describe the effects by means of a third- 
degree equation in the temperature but physically such a complex 
relationship in so narrow a temperature range is highly unlikely. 

In the experiment the same set of 5 coils were measured first early 
in the morning, and then at different times during the day while the 
temperature in the room was steadily rising. Each time a set of measure- 
ments was made, the zero-point of the apparatus had to be adjusted 


Coil. 1 4 5 
% % % % % 
1.45 0.55) 
c 
s 0.30 1.00 0.60 
> 
& 1.40}— 0.50 
| 0.25 0.95) 55 
e 
1.35 45 
0.20 0.90 0.50} 


FIG, 3. SELFINDUCTANCES OF 5 COILS. DEVIATION FROM A STANDARD PLOTTED 
AGAINST THE TEMPERATURE OF THE MEASURING BRIDGE, 
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afresh. Also the temperatures recorded were rounded off to whole 
degrees and there may be errors in the temperature of up to 0.5°C. 
Both the zero-point adjustments and the errors in temperature will 
introduce errors in the selfinductances which vary in a random manner 
between the 4 sets of observations, that is, between temperatures in 
table 7 and in fig. 3, but are constant over the 5 coils at a given tempera- 
ture. We shall call these errors the errors of adjustment. 


On the basis of this argument it seems reasonable to assume the 
model 


Zi, +¢7;+ 6+ 6; , (3) 


b; specifying differences in the general level between coils, 

C; specifying temperature coefficients which may vary from coil to coil, 
6; specifying the errors of adjustment, and 

«,; the errors of observation. 


The greek symbols 6 and ¢ have been used to distinguish the random 
variables from the systematic phenomena denoted by roman letters. 

The analysis of variance now runs as follows. First of all we may 
sum the observations on coils 1 to 4 for each temperature and carry out 
a linear regression analysis on the totals; this gives the sums of squares 
in the second row of table 8. Next we adjust a linear equation to each 
coil separately which gives the sums of squares in the first row of table 
8. The last row contains the difference of these sums of squares. 


TABLE 8 
Steps in the analysis of variance of the data in Table 7 (coil 5 excepted) according to 
the model (3) 


Sum of squares for 


Linear Residual DF. 
Regression 
For coils individually 0.003583 4 0.000440 8 
For sum over coils 0.003530 1 0.000376 2 
Difference 0.000053 3 0.000064 6 


In table 8 the total sum of squares, 0.003583, for linear regression 
has been split up into the amount 0.003530 with one degree of freedom 
for the regression common to all four coils and the amount 0.000053 
with 3 degrees of freedom corresponding to differences in the regression 
coefficients between coils. 
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This well-known technique now also applies to the residuals. The 
sum of squares 0.000376 with vy = 2 corresponds to residuals common 
to all coils at a given temperature; that is, to the random component 
6 in model (3). And the remaining sum of 0.000064 with » = 6 corre- 
sponds to the variation in the residuals from coil to coil; that is, with the 
errors of observations ¢ in model (3). 

Apart from these effects we also have evident differences in the 
general level from coil to coil, so that the final analysis of variance takes 
the form of table 9. 


TABLE 9 
Analysis of variance of the data of table 7 in keeping with the model (3). 
Source S.S. D.F. M.S. 
Linear component L 0.003530 1 0.003530 
Residual I (8) 0.000376 2 0.000188 
Between coils C 3.28 3 1.09 
Coils X Linear Comp. C X L 0.000053 3 0.000018 
Residual IT (e) 0.000064 6 0.000011 


The errors of adjustment 6 are the same for all 4 coils at one temperature. 
Hence these errors will influence the regression L common to all coils, 
but they will not influence the differences between coils or the C X L 
interaction. These last two effects must therefore be tested against 
Residual II, but the linear effects L must be tested against Residual I. 

The analysis shows that we have no reason to assume differences in 
the regression coefficients between coils. The model finally leads to the 
following numerical results. 


Coil Changes Z in selfinductance in % 
1 Z = 1.389 
= 0.2281 _ 9 o100(7' — 23)% (4) 
3 = 0.460 
4 = 0.986 


_ Reading errors s; = 0.0033%; v = 6 
Errors of adjustment s; = 0.007%; (v ~ 2) 


The estimate of the reading errors is simply the root of Residual II in 
table 9. To find an estimate for the errors of adjustment we must 
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recall that the Residual I in table 9 was obtained from the sum of 4 
observations, one for each coil. Hence the expected value of Residual 
Lis 

+o, 
and an estimate of o; is consequently given by 


aa 0.000188 — 0.000011 
$ 


= 0.000044. 


The correction for s: is very small so that we may approximately assign 
2 degrees of freedom to the estimate s; . 


4. The theory of two-way classifications 
The model usually applied to a two-way classification is 


+¢+64;. (5) 


Above we have, however, encountered several examples where this 
model is inadequate and a further extension of the theory seems ap- 
propriate. 

First of all we must distinguish between classifications according to 
known and unknown levels*. 

If we wish to investigate differences in electron emission of radio 
valves when the cathodes have been made from different batches of 
raw materials we have a classification by unknown levels. If, however, 
‘these raw materials have been chemically analyzed and classified 
according to their content of reducing impurities we have a classifica- 
tion according to known levels. In the first instance we only ask whether 
there are differences between batches, while in the second instance we 
try to relate these differences to 2 measured chemical characteristic of 
the batches. 

Likewise in the example of table 7 the classification by temperature 
is by known levels and the classification by coils is by unknown levels. 

- We may now further distinguish between three different kinds of 
two-way classifications: 


(A) unknown levels in both directions; 
(B) known levels in one direction; 
(C) known levels in both directions. 


Different models must be applied to each of these situations. 


*O. L. Davies and his colleagues* have introduced a distinction between quantitative and qualitative 
factors. This paper was prepared before their book was available and it may be interesting to note that 
we have both independently felt the need for this distinction. 
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5. Two-way classifications with unknown levels in both directions 


This is the case most commonly considered in textbooks and analyzed 
into a mean square for rows, for columns, and for residual. The model 
customarily assigned to this design is given by (5). Here we would, 
however, like to propose a change, replacing (5) by 


Z,=at+b; +e, +45; + €; (6) 


where d;; represents the interaction, that is, a systematic component 
depending both on row and column and due to non-additivity of the 
row and column effects, while ¢;; is the purely random component; a 
model that was also used by Tukey’. 

In industry one sometimes has to analyze two-way experiments 
which have been performed without replication. It must then be borne 
in mind that the residual contains the pooled effect of interaction and 
random fluctuations. It is an experiment in which these two components 
are confounded. The statistical significance will in such cases be under- 
estimated because the estimate of error is biased in the direction of 
higher values by the interaction term. 

Whether or not it is desirable to separate interaction from the 
random fluctuations depends on the situation envisaged. If we find 
pronounced and highly significant differences between rows and columns 
it will often be of small interest to know whether there is interaction, 
because this interaction is evidently small compared with other effects 
present and hence is of no technical importance. 

A separation of the interaction and the random component can of 
course be carried out by replication, because in replications the inter- 
action is repeated while the random elements change from one replicate 
to another. 

A clear distinction between interaction and random components is, 
we believe, essential for a correct understanding of more complex cases. 

In this connection a definition of the term replication would also 
seem desirable. In textbooks this term is often introduced without 
further explanation. To judge from its use in statistical analyses, 
replication is equivalent to the introduction of an additional factor of 
which we are convinced that it does not interact with the other factors 
of the experiment. 

Replication on different blocks in one field satisfies this definition. * 
We expect differences between blocks but no block-treatment inter- 
actions; these never occur in the analysis. 

But in industry replications often mean repetitions of the experiment 
on different days. In the meantime conditions may have changed, say, 
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by changes in raw materials, by exhaustion of a chemical solution or 
otherwise, and we can never be quite sure that such changes do not 
influence the outcome of an experiment otherwise than by the general 
level only. We have occasionally observed interactions where they 
were not suspected at all. It may sometimes be wise to verify by an 
appropriate analysis whether the replications can be truly considered 
as such; and the term should not be used too loosely and should be 
properly defined. 

An alternative technique for discovering interaction without repli- 
cation will be discussed in §11. 


6. Two-way classification with known levels in one direction 


The examples presented in tables 5 and 7 are cases in point. From 
the accompanying discussion it will be clear that we now have the choice 
between a variety of models. If Y; (j = 1, --- n) signify the known 
levels for the n columns we can introduce a linear component cY; into 
our model common to all the rows, or alternatively linear components 
c;Y; with different higher-order components can be introduced in 
similar fashion. 

As a rule, however, the variations in the levels are comparatively 
small and we need not go to higher powers than the second. As a matter 
of fact cases with a linear component occur quite frequently; cases 
where a significant quadratic component is observed are occasionally 
encountered in the literature, but we are not aware of any case where a 
cubic term was required; these are anyhow very rare. 

As a general model for the design under consideration we may 
therefore pose 


Zi (7) 


In some cases, as in table 7, it may be necessary to add a term 7; to 
account for random errors which are constant through each column; 
in that case the columns are classified by a mixture of known and 
unknown levels. In how far the model (7) can be simplified by omitting 
the index 7 from b, c or d, or by dropping the quadratic term will depend 
on the situation. 

The customary procedure to treat a two-way classification of the 
present type is by first analyzing into rows and columns and by intro- 
ducing a linear or quadratic component only when significant differences 
between columns have been found. 

That this procedure is not quite satisfactory is demonstrated by the 
example of table 5. There an analysis into columns gave a mean square 
of 544 for 4 degrees of freedom, while the introduction of a linear 
component gave a mean square of 2125 for one degree of freedom and a 
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greatly enhanced level of significance. It may therefore well happen 
that we do not find significant differences between columns, where we 
would find a clearly significant linear component. 

Hence the linear component should be introduced into the model 
right at the outset. 


7. The two-way classification with known levels in both directions 


In this situation the method of the previous section can be further 
extended. If X; (¢ = 1, 2, --- , m) and Y; (j = 1, 2, --- , m) denote 
the known levels for the m rows and n columns respectively, we can 
introduce not only terms in X, X’, Y, and Y’ into our model but also 
mixed or interaction terms of the form XY, X’Y, etc. The full model 
up to the second degree becomes 


Zi; = Ao + + + + + + (8) 


The most convenient way to apply such a model is by means of orthog- 
onal polynomials*. Let &(X,) (k = 0, ---, m — 1) be the m orthogonal 
polynomials of degree k in X, , and n,(Y;) (l = 0, --- ,m — 1) similarly 
the n polynomials in Y,; of degree 1; then the products 


m( Y;) (9) 


define mn two dimensional polynomials in X and Y which are mutually 
orthogonal over the complete set of mn combinations of X; , Y;. The 
regression coefficients are 


(10) 


and the corresponding sums of squares 
{ 


(11) 


Vu = 


In principle we can thus split the total sum of squares into mn different 
portions** each with one degree of freedom; as stated it will in practice 


*The use of orthogonal polynomials for an analysis in two or more dimensions, though not common, 
is not new. The technique is, for example, described by DeLury.‘ 

Usually the total sum of squares 0s )°; (Zi; — Z..)? is associated with (mn —1) degrees of free- 
dom, the one degree of freedom for the general average Z.. being disregarded because it is not of interest. 
In the present instance it is theoretically more convenient to retain this degree of freedom as part of our 
general argument. It might be used to test whether the average Z.. differs significantly from zero, but in 
most situations such a test is quite superfluous. For the same reason it is appropriate to consider poly- 
nomials of degree zero as part of the complete set of orthogonal polynomials; they are of course constants 
and may be taken equal to unity: £0 = mo = 1. 
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seldom be necessary to go beyond terms of the second degree. As an 
example of this type of analysis let us consider the data in table 10, 
giving the relative changes observed in a life test with resistors of 
different resistances and nominal wattages. 


TABLE 10 
Relative changes in resistance in per mil; each value is an average observed on 
5 resistors. 
Wattage Resistance 
100 200 500 1000 2000 2 
Changes in resistance = Z;; 
1/8 2.8 2.6 3.0 2.5 2.1 2.6 
1/4 1.9 2.1 4.1 2.2 17 2.4 
1/2 3.2 2.8 1.9 3.4 4.2 3.1 
1 2.0 3.4 3.1 3.6 3.9 3.2 
2 2.1 2.6 3.4 4.3 6.1 3.7 
Zi. = 2.4 2.7 3.1 3.2 3.6 3.0 


An ordinary analysis of variance yields the following result. 


TABLE 11 
Ordinary analysis of variance of the data in table 10 
Source SS. MSS. 
Resistances 4.30 4 1.07 
Wattages 5.30 4 1.32 
Residual 14.88 16 0.93 


There are no clearly significant effects. If we compute the residuals 
term by term, however, we obtain the results given in table 12. 


TABLE 12 

Residuals of table 10, when row and column differences have been eliminated. 
100 200 500 1000 2000 2 

1/8W +0.8 +0.3 +0.3 +0.3 -1.1 
1/4 +0.1 0.0 +1.6 —0.4 —-1.3 
1/2 +0.7 0.0 -1.3 +0.1 +0.5 
1 —0.6 +0.5 —0.2 +0.2 +0.1 
2 —1.0 —0.8 —0.4 +0.4 +1.8 
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These residual terms indicate that the analysis of table 11 is not 
quite adequate; for positive and negative residuals are clearly con- 
centraved along the two diagonals of the table, which cannot easily be 
understood if they are the result from purely random variations. 

The values of the wattages are in geometrical progression and the 
values of the resistances very nearly so; it seems therefore reasonable to 
specify the levels of rows and columns by log W and log R respectively. 
We then have equispaced levels which can be represented by X; = — 2, 
— 1,0,+ 1, + 2, and Y; = — 2, — 1,0, + 1, + 2, and we may use 
the existing tables of orthogonal polynomials*’®. Carrying out the 
analysis on this basis up to the second degree we obtain the result of 
table 13. 


TABLE 13 
Analysis of variance of the data of table 10 by means of orthogonal polynomials 
Source Polynomial | Leading S.S. DF. MS. 
term 
General average Eono 1 225. 1 225. 
4.50 1 4.50 
Wattages ono X? 0.23 1 0.23 
tom ¥ 4.50 1 4.50 
Resistances tons y? 0.003 1 0.003 
Interaction | tim ay 7.13 1 7.13 
Residual 814 19 0.44 


Whereas the differences between rows and columns were not sig- 
nificant in the analysis of table 11, we now find linear terms in both 
directions which are very clearly significant. In addition we find an 
interaction term which is also highly important; by removing this 
component the residual mean square has been halved. We conclude 
that the changes in resistance values can be represented by a simple 
equation comprising terms in X, Y, and XY, the terms in X? and Y? 
being clearly unimportant. Numerical computation yields the equation: 


R = + + + 0.267%.n, per mil (12) 
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R = 3.0 + 0.30X + 0.29Y + 0.267XY per mil, 
and the residual standard deviation is s = 0.63 per mil. If we calculate 


the expected changes from this equation and subtract these from the 
original observations we obtain the residuals entered in table 14. 


TABLE 14 
Residuals of the data of table 10 with respect to the model 12 


100 200 500 1000 2000 2 
1/8 W —0.09 —0.04 +0.60 +0.34 +0.19 
1/4 —0.75 —0.58 +1.40 —0.52 —1.05 
1/2 +0.78 +0.09 —1.10 +0.11 +0.62 
1 —0.19 +0.66 —0.20 —0.26 —0.51 
2 +0.15 —0.18 +0.20 —0.12 +0.85 


The systematic distribution of signs noted in table 12 has now dis- 
appeared. 

Of course the assumption that the levels for rows and columns are 
proportional to log W and log R is rather an arbitrary one. It leads 
to a satisfactory description of the data and from the point of view of 
the user of the resistors this may be considered a sufficient a posteriori 
justification. When, however, we are investigating the physical or 
chemical processes causing the changes in resistance the problem is 
quite a different one. It must then be borne in mind that there may be 
other ways of specifying the levels which lead to equally satisfactory 
presentations of the observations but are to be preferred in view of our 
concepts of the underlying processes. 

Since analyses of this type are not very common a second example 
may be appropriate. It is provided by the observations on the thickness 
of aluminum-oxide layers recorded in table 1 and is of interest since it 
is not in this case the XY interaction which is of importance. 

To treat heights H and positions P as unknown levels, as we did in 
tables 2 and 3, is not really satisfactory and it seems preferable to intro- 
duce known levels to describe the dependence of layer thickness on 
these two factors. Since both heights and positions are equally spaced 
we may again use the tables of orthogonal polynomials. Table 15 gives 
the final result of such an analysis. 
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TABLE 15 
Analysis of variance of the data of table 1 by orthogonal polynomials 


Source | Polynomial |; Leading S.S. D.F. MSS. 
term 
Heights 810 1 810 
ono X? 480 1 480 
Positions Eons y? 453 1 453 
Interactions Eine xy? 603 1 603 
X?Y 346 1 
Residual 366 9 at 1s = 10 


By the symmetry of the positions the absence of Y(é,) can be im- 
mediately understood. And the significant X Y* term (&,7.) is explained 
if we assume that the linear effect with height for the two outer positions 
(P, and P;) differs from that in the centre. Though the X’Y(ém) 
term nearly reaches the 1% level of significance, an effect of this type 
is technically difficult to explain. For this reason one may be inclined 
to disregard this term and pool it with the residual. If we do so the 
adjusted model is 


Z= 137+ no + — 4.6&:n2 (13) 


and the residuals with respect to this model are those of table 16. That 
this model is not quite adequate is revealed by a somewhat systematic 
distribution of signs. This could be remedied by including a £,, 


TABLE 16 
Residuals i\/n wu when the data from table 1 are explained by the model 13. 


Positions 
P, Ps Py Ps 
A, -—9.6 -—0.7 —1.4 +3.3 +8.4 
eights +3.6 +17.7 —8.6 —4.4 
Hy; —4.2 —2.9 +2.2 +1.1 +3.8 


component in the model (13). With respect to this term statistical 
and technical arguments contradict one another and it would be best 
to carry out further experiments before reaching a decision. 
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8. The function of the analysis of variance 


As pointed out earlier sums of squares, degrees of freedom, and 
mean squares are not the most suitable form for representing the results 
of a statistical analysis to technical-minded people. Concrete numerical 
equations describing the observations plus standard deviations for the 
residuals these equations do not explain, is what they need. For 
technical purposes the model expressed in numerical form is of greater 
interest than the analysis of variance. What then is the function of the 
analysis of variance in the entire analytical procedure. 

In textbooks it is usually stated what is the model pertaining to a 
given experimental design. This, we believe, is a statement which 
may lead to misinterpretation and which therefore requires some 
further discussion. 

If we set out to investigate a situation where several factors are 
varied we do not know beforehand what will be the correct model to 
describe our observations. This is exactly what we wish to find out and 
the result will depend on what effects turn out to be significant. We 
must therefore distinguish between two kinds of model, which we may 
suitably term the statistical and the practical model. 

The model of the textbooks belonging to a specific type of experi- 
mental design is the statistical model. It is the most complex model 
with which the design can adequately deal, the analysis giving one mean 
square for each of the components in the model. 

The practical model is the simplest model which provides an adequate 
description of the observations. When we start the experiment we 
consider a great variety of conceivable practical models. We must 
then construct a statistical model which includes all the conceivable 
practical models and we must design our experiment accordingly. 

Looked at from this point of view the analysis of variance serves 
two purposes. ; 

The main function of an analysis of variance is to decide between a 
variety of conceivable practical models. Each mean square in an analysis 
of variance corresponds to a definite term in the equation expressing the 
model; and whenever a mean square is not significant the corresponding 
term can be cancelled and the model simplified. Thus by means of the 
analysis of variance we are able to envisage a great variety of different 
models and to pick from these at a glance the simplest model that will 
adequately describe our observations. The examples given above pro- 
vide illustrations of the procedure. Many others may be taken from 
the literature. 

The second function of the analysis of variance is to provide an estimate 
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of the residual fluctuations, not accounted for by the practical model finally 
adopted. 

Combining these various remarks we may now set up a few simple 
rules for using the analysis of variance technique in industry. 


A. Plot the data 


By looking at a plot we can often rule out some models as evidently 
unsuitable and thereby simplify the analysis. Also in industrial ex- 
periments clerical errors or gross errors of observation are not uncommon. 
A graphical presentation may help to spot and remove outliers, which 
otherwise may seriously warp our conclusions. The experiment in 
table 7 and fig. 3 provides an example. 


B. Make up a set of conceivable models 


This should be done on the basis of our technical knowledge and 
the graphical presentation of the observations. 


C. Decide between the conceivable models by means of an analysis of 
variance and find the residual variance. 


D. Express the model chosen in the form of concrete numerical equations. 


In this form the result of the analysis should be sent back to the 
technicians in the factory. 


9. Levels of significance 

The procedure just outlined may be the subject of statistical 
criticism. In fact if we use the analysis of variance to test some null- 
hypothesis it is not permissible to adjust the model to be tested to the 
observations; the model must be fixed beforehand. 

This is of course true. Models adjusted to the observations will 
generally have a better chance of fitting than predetermined models; 
that is by the adjustment the probability of obtaining significance will 
be increased. 

It is not likely, however, that this increase will be such as to invalidate 
the conclusions drawn. On the whole critical testing of significance is 
of secondary interest. Usually the analysis of variance mainly serves 
to provide a general survey as to what effects are pronounced but a 
final decision which of the significant effects are to be incorporated in 
the model will often be made rather by technological than by statistical 
argument. 

Hence if by the procedure of the previous section the levels of 
significance are somewhat violated, this should not be a matter of 
serious concern. 
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10. Normality; computing the residual 


This remark also furnishes a reply as to conditions of normality. 
Tests of significance usually assume that the random fluctuations are 
normally distributed. But considerable deviations from normality 
will cause only comparatively slight changes in the significance levels; 
so we need not be seriously concerned on that score. 

Deviations due to reading or clerical errors, or to outlying obser- 
vations caused by abnormal experimental conditions are more serious. 
We have already pointed to the importance of plotting the data before 
embarking on a numerical analysis in order to detect outliers. 

An alternative method for checking is by computing the elements of 
the residual as we have done in tables 4, 12, 14, and 16. Sometimes 
the distribution of positive and negative signs among the residuals 
indicate that the model is not quite adequate; tables 12 and 16 furnish 
examples. Also the values of the residuals may assist in locating 
outliers; for example the residual of 17.7 in table 16 is a bit high; it is 
more than twice the estimated standard deviation of a single observation 
(s = 8.4; » = 10). This, however, is not a correct test. The value 
17.7 is a residual with respect to the adjusted equation 13 and the 
standard deviation of such residuals is on the average smaller than 
that of the observations themselves. Besides the 17.7 residual is 
included in s, and if it is abnormally high it will have produced an 
abnormal increase in this estimate. It would not be easy to give a 
precise test, but we may certainly consider the residual 17.7 as suspect. 

If the residuals are sufficiently numerous they can be plotted in a 
histogram which should approximately have the shape of a normal 
distribution. In many instances we have found the computation of 
the residuals not too laborious and really helpful in checking and 
understanding the underlying analysis. 


11. Testing for interaction ath 
levels in both directions 


As pointed out in §5 we can separate interaction from random 
fluctuations by replication. We will now briefly discuss an alternative 
method which has been proposed by Tukey® and Ward and Dick’. 

For known levels in both directions the leading interaction term is 
given by the product X,Y, as discussed in §7. If we do not know the 
levels we can still test for interaction by using the leading differences 
(Z;. — Z..) and (Z.; — Z..) computed from the observations for rows 
and columns as estimates for X; and Y;. In keeping with the methods 
of §7 we then obtain a regression coefficient. 
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14 
and a corresponding sum of squares 
(Z;. — Z..(Z.; — Z.)Z;\? 
Vin = (15) 


{(Z;. — — Z.)}" 


This is Tukey’s method. A more satisfactory procedure would be to 
adopt the model 


g + a; + b; + ca;b; + , (16) 


as proposed by Ward and Dick’. This model can not be adjusted in a 
simple way, but Ward and Dick have developed an iterative procedure, 
the first stage of which is equivalent to Tukey’s method. Since Ward 
and Dick have not provided numerical examples of their method it 
seems appropriate to do so here. The formulae are given in the 
Appendix. 

First let us reconsider the data of table 10 interchanging rows and 
columns in random order so that the systematic variations in row and 
column averages are lost (table 17). To fix our thoughts let us imagine 
that we have five batches of the same type of resistors which have been 
subjected to a heat treatment on five different occasions. It is quite 
conceivable that variations in humidity during production interact 
with humidity variations during the heat treatment; Tukey’s or Ward 
and Dick’s procedure might reveal such interaction. 


TABLE 17 
Data of table 10 with rows and columns interchanged in random order and ascribed to 
imaginary factors. 
Batches of 1 2 3 4 5 


resistors 


Per mil changes in resistance 


| 4 
| 1 1.9 4.1 2.4 . 
| ‘Heat 2 2.0 3.6 3.4 3.9 3.1 3.2 7 
| ‘Treatment 3 3.2 3.4 2.8 4.2 1.9 3.1 
4 2.1 4.3 2.6 6.1 3.4 3.7 
5 2.8 2.5 2.6 2.1 3.0 2.6 
| 2.4 3.2 3.7 3.6 
| 
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In table 18 the mean squares resulting from successive stages of 
Ward and Dick’s analysis have been entered together with those 
resulting from our previous analyses; table 19 contains the successive 
values of the constants a, , b; and c. 

If we introduce linear components as in table 13 we find significant 
effects for rows, columns, and interaction. If we analyze into rows and 
columns, the significance of row and column differences disappears 
because the greater part of a sum of squares which was first concentrated 
‘into one degree of freedom is now spread out over 4 degrees of freedom. 


TABLE 18 
Ward and Dick’s analysis applied to the data of table 17; mean squares in (per mil)? 


Analysis Columns = Rows = 
Batches Treatments | Interaction Residual 


M.S. D.F.| M.S. D.F.| M.S. D.F.| M.S. D.F. 


With linear components 

as in table 13 4.20 1 | 4.50 1 | 0.40 21 
Ordinary analysis into 

rows and columns 1.07 4 | 1.32 4 —- — {0.93 16 


Ward and Dick’s analysis 
Ist Stage = Tukey’s 


method 1.07 4 | 1.32 4 | 6.15 1 |0.58 15 
2nd Stage 1.07 4 | 1.32 4 |12.49 1 |}0.16 15 
3rd Stage 0.89 4 | 1.31 4 | 9.61 1 {0.41 15 
4th Stage 0.87 4 | 1.31 4 |10.51 1 |0.35 15 
5th Stage 0.88 4 | 1.32 4 /10.58 1 |} 0.34 15 


But Tukey’s analysis still reveals the interaction though somewhat 
less pronounced than before. 

The significance of this interaction is greatly enhanced in the sub- 
sequent stages of Ward and Dick’s analysis. 

In the second stage the mean square rises to 12.49 but is reduced 
again in the later stages. This may seem surprising but it should be 
borne in mind that these mean squares are computed from formulae 
which are correct only when the constants in the model have been fully 
adjusted, but do not hold for the intermediate stages of approximation. 
Hence we must judge by the final results. 

It will be seen from tables 18 and 19 that after 4 stages we reach a 
reasonable constancy; the changes from the 4th to the 5th stage are 
relatively unimportant. 
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TABLE 19 
Adjusted constants when Ward and Dick’s analysis is applied to the data of table 17. 
Stage in the analysis 
Ist 2nd 3rd 4th 5th 
4, | —0.6 —0.460 | —0.538 | —0.536 | —0.550 per mil 
Rows = é, | +0.2 +0.162 | +0.137 | +0.142 | +0.138 
Heat treatments 4; +0.1 —0.009 | +0.129 | +0.117 | +0.140 
é& | +0.7 +0.795 | +0.720 | +0.726 | +0.716 
é, | —0.4 —0.488 | —0.448 | —0.449 | —0.443 
6, | -0.6 —0.489 | —0.407 | —0.380 | —0.383 
Columns = 6. | +0.2 | 40.230 | +0.180 | +0.185 | +0.187 
Batches 6, | -0.3 —0.247 | —0.210 | —0.205 | —0.206 
6& | +06 +0.792 | +0.643 | +0.659 | +0.666 
6, | +0.1 —0.286 | —0.206 | —0.260 | —0.264 
Interaction é +0.260 | +0.370 | +0.360 | +0.380 | +0.378 


From a technical point of view it is to be observed that when in a 
practical case with unknown levels in both directions we find such a 
pronounced interaction as in the last row of table 18 but no significant 
effects between rows or columns, this may be taken as a strong indication 
that definite factors have been operative in the rows and columns, and 
that we have failed to find row and column effects only because these 
factors have not been taken into account and their effect has been spread 
out over too many degrees of freedom. 

A second instructive example is provided by the observations on 
layer thicknesses of Al,0; coatings recorded in table 1; the final results 
after a five-stage iteration are given in table 20. 

In the complete analysis by orthogonal polynomials (table 15) we 
found £7. and £7, to be the two significant interaction components 
with a total sum of squares of 949. Of this only the amount 512 is 
revealed by the Ward and Dick analysis which does consequently not 
detect interaction effects as effectively as does the analysis of §7. To 
study this point somewhat more in detail the analysis by orthogonal 
polynomials was also applied to the residuals with respect to Ward and 
Dick’s model, computed with the aid of the adjusted constants of table 
20A. This gave the results recorded in table 21. 
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TABLE 20 
Final results obtained by applying a 5-stage Ward and Dick analysis to the data 
of table 1 


A. Adjusted constants, in microns 


Heights Positions Interaction 
sb, —8.38 
— 6.46 +2.82 
sy — 6.45 sbs +6.89 sé = 0.1124 
os +13.00 she +3.44 
505 —4.77 


Source S.S. D.F M.S 
Heights 1268 2 634 (microns)? 
Position 480 4 120 
Interaction 512 1 512 
Residual 798 7 114 


TABLE 21 


Sum of squares corresponding to the £i2 and £m interaction components in the origi- 
nal data of table 1 and in their Ward and Dick residuals 


Sum of squares 
Component 
Original Ward and 
data Dick 
residual 
ine 603 45 
&m 346 298 


We see that the Ward and Dick analysis has chiefly removed the 
£,n2 interaction and not the £27, interaction. 

In this connection it is to be noted that in table 15 and the model 
(13) of the “pure” terms, containing X or Y alone, only &,7 , 290 , and 
fon. occurred, while the linear term in Y, £7, , is missing; the sum of 
squares for this term was very small, 11” only. Table 21 therefore 


B. Mean squares | 
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suggests that the Ward and Dick analysis reveals interactions of the 
type &, only if the corresponding pure components £9, and &o are 
both pronouncedly present. 

This is reasonable enough, though our attempts to prove it have 
failed. If we represent the observations by their orthogonal com- 
ponents, Z = }°>> 1,; &n, , it is fairly easy to show that interactions 
t,n,(k > 0,1 > 0) are contained in the first stage estimates ,4; and 5, , 
and contribute to the interaction constant ,é only if both the pure com- 
ponents & 7 and £m, are contained in Z. What happens at later stages 
becomes complex, however, and is not mathematically very tractable. 

It should also be observed that Tukey’s analysis yielded a mean 
square for interaction of 190” against 512y” for Ward and Dick’s. 
The latter method of analysis is therefore in this instance much more 
effective. 

Of course the applications we have made above are somewhat 
artificial, because we have applied an analysis for unknown levels to 
cases where in reality the levels were known. How often cases may 
occur where Ward and Dick’s analysis will reveal interaction effects 
which could not be discovered otherwise, it is difficult to say. But 
there can be little doubt that this method is an interesting addition to 
our arsenal of statistical techniques, which it is well worth trying out 
in practice. 


12. Conclusions 


This paper does not contain any material that is essentially new. 
Examples of the various types of analysis of §3, and 7 have occasionally 
been published, but they have never been brought together and dis- 
cussed from a unifying point of view. 

Even a two-way classification may give rise to a great variety of 
models and analyses. All the situations described occur in industrial 
applications and to deal with industrial problems we must have the 
whole gamma of techniques discussed above readily available. In this 
respect the treatment of the two-way classification in existing textbooks 
is not quite satisfactory. 

It will be clear that if we pass on to three-way or four-way classi- 
fications the variety of cases increases tremendously, so much so that a 
systematization of conveivable models and methods of analysis seems 
almost impossible. The situations encountered in practice are never 
exactly the same, and in each case we have to think out afresh what 
models and what kind of analysis can best be applied. Sometimes it 
even requires some trial-and-error analysis before a satisfactory solution 
is obtained. 
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But in order to solve successfully these complex problems the l 
implications of a two-way classification should first of all be fully 
understood. The present paper was written with this point in mind. 
. I wish to conclude by expressing my sincere thanks to Mr. A. M. van 
Beek for his valuable assistance in carrying out the numerical analyses. 


Eindhoven, December 3rd, 1954 


LITERATURE 


1. A. Palazzi, La progettazione fattoriale delle ricerche industriali; exempi di 
applicazione. La Metallurgica Italiana 6, 215-220, 1952. 

2. O. L. Davies a.o., Design and analysis of industrial experiments, Oliver and Boyd, 
Edinburgh, 1954. 

3. J. W. Tukey, Interaction in a row-by-column design, Memorandum Report 18, 
Statistical Research Group, Princeton University, 1949. T 

4. D. B. DeLury, Values and integrals of the orthogonal polynomials up ton = 26, 
University of Toronto Press, 1950. : 

5. R. A. Fisher and F. Yates, Statistical tables for biological, agricultural and medical 


research, Oliver and Boyd, Edinburgh, 1948. 7 
6. J. W. Tukey, One degree of freedom for non-additivity, Biometrics 5, 232-242, 
1949. 
7. G. C. Ward and I. D. Dick, Non-additivity in randomized block designs and bal- y 
anced incomplete block designs, New Zealand Journal of Scier.ce and Technology 33, 


430-435, 1952. 


APPENDIX: The iterative procedure of Ward and Dick 


Model: Z=gta, +b; + ca,b; + «; (17) 
Symbols: 1 
Z;; = the observations; 

t = 1,2, --- , m indicating rows, 

j = 1,2, --- , indicating columns A 
m = number of rows, es 
nm = number of columns. 
S, = z Z;; = Sum of the observations in the i-th row, P 
S.,; = >» Z;; = Sum of the observations in the j-th column, 

S.. = Z;; = Sum of all observations, 
24; al 
,b; 2 estimates of a, , b; and c obtained after k iterations cc 
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Formulae 
S,. — S../m + — 
The estimate of g is 
g = S../mn. (21) 
The iterative process is started with 
of = 0. (22) 


We then compute ,4,; , ,5; , from these ,é and so on. A check on the 
computations at each stage is provided by the equations 


= 0, (23) 


> +b; = 0. (24) 


The reduction in the sum of squares is given by 
R = gs.. p a;S;. + é (25) 
As pointed out in §11 this formula only holds exactly for the final 
estimates 4; , 6; , é but is not correct for intermediate estimates 
4; 4D; ’ 2¢. 
Example 
As an example we consider the data of table 1. First of all we find 
§ = 137, (26) 


and all further computations may he simplified by subtracting a 
constant, for instance 100, from each datum. The basic data then take 
the form of table 22. 
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TABLE 22 
Basic data for computation according to Ward and Dick’s model. 
J Si. S;. — S../m 

t 25 30 28 34 43 160 —25 

{ 26 50 27 24 18 145 —40 
30 55 68 59 38 250 +65 

8.; 81 135 1128 117 99 555 
S.; —S../n =| -30 +424 +412 +6 -—12 =5S.. 


i m = 3; n=5 


The successive computations may then conveniently be arranged as in 
table 23, which may be continued towards the right for subsequent 


stages. 
TABLE 23 
Computing scheme for the Ward and Dick analysis. 
1 —25 -§ —2 | — 6.49 
2 —40 — 8 224 3132 — 6.72 
3 +65 +13 378 J +13.21 
3.5.3132 _ 
15;8.; = 600 600.1290 
j = S.; S../n 1b; 14,Z;; => 0.0607 2b; 
7 1 —30 —10 57 —10.68 
44 4 424 +8 165 + 4.65 
3 +12 + 4 528 3132 + 7.19 
ma 4 + 6 +2 405 + 3.78 
: 5 —12 — 4 135 — 4.93 
m = 3,n = 5. 


1d; 5, Zi: is computed 14; and as b; 14; 
Z;,; to check the arithmetic. 
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THE EXPLORATION AND EXPLOITATION OF 
RESPONSE SURFACES: 


AN EXAMPLE OF THE LINK BETWEEN THE FITTED 
SURFACE AND THE BASIC MECHANISM OF THE 
SYSTEM 


G. E. P. Box ano P. V. 
Imperial Chemical Industries Limited, 
Dyestuffs Division Headquarters, 
Blackley, Manchester, England 


This is a sequel to an article which recently appeared in this journal 
[1] and had the same general title. The previous article described a 
number of applications of newly developed techniques [2] for the study 
of response surfaces. The present article shows how study of the form 
of the empirical surface can throw important light on the basic mechan- 
ism operating and can thus make possible developments in the funda- 
mental theory of a process. This idea is illustrated in some detail with 
an example previously discussed only from the empirical standpoint. 
A theoretical surface, based on reaction kinetics is now derived, rate 
constants are estimated from the data and the theoretical surface is 
compared with the empirical surface previously obtained. It is then 
shown how the canonical variables of the empirical surface can relate 
to the basic physical laws controlling the system. In this connection 
the problem of suitable choice of metrics for the variables is discussed. 
In a final section some general remarks on the process of scientific 
investigation are appended. 


I. INTRODUCTION 
A response surface is a graphical representation of a relationship 


between some response such as yield, whose level is denoted by 7, and a 
number of quantitative variables (or factors), such as temperature, 
time and concentration, whose levels are denoted by x, , 22, +++ , 2%. 

The feature of the surface of greatest interest is often the value or 
values of the variables xz, , r. , --- x, for which 7 is a maximum. 

In the previous paper it was emphasised that the study of numerous 
examples had indicated that sharply defined point maxima appeared to 
be something of a rarity. The typical situation was that in which 
the response was found to be insensitive in the region of the maximum 
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to certain joint changes in the levels of variables, indicating the exis- 
tence of ‘factor dependence’. An extreme form of this phenomenon 
occurred where there was a line, plane, or space of near-maxima rather 
than a single point maximum. Such a response surface was said to 
contain a ‘stationary ridge system’. A second type of surface of common 
occurrence contained a rising ridge. It was suggested that the nature 
of a ridge system could indicate the physical laws which underlay the 
process studied. 

The method recommended for exploring a response surface con- 
sisted first of performing a simple pattern of experiments designed to 
detect, in the initial region explored, any general sloping tendency of 
the surface. If such a tendency was found, further experiments were 
performed in the indicated direction of increasing response. Either 
initially, or after one or two cycles of this ‘steepest ascent’ procedure 
had brought the experimenter to a region of higher response, it was 
usually found that no sloping tendency could be detected and exploited. 
The region so attained was then examined by performing a slightly more 
elaborate pattern of experiments and fitting a suitable function which 
enabled curvature in the surface and dependence between the variables 
to be taken account of. 

In the absence of prior knowledge concerning the form of the response 
function, a local representation could be obtained by fitting a poly- 
nomial in x, , 2, -*+ , t, , in which all terms up to a given order d were 
included. This was of course equivalent to supposing that the true 
function could be locally represented to a sufficient approximation by 
its Taylor series ignoring terms of order higher than d. 

In the majority of applications, where the object was not so much 
to graduate the response surface accurately but rather to determine 
approximately its general characteristics in the optimum region, an 
equation of only second degree has usually been adequate.* Reduction 
of this fitted second degree equation to canonical form has allowed 
the nature of the fitted surface to be readily appreciated and has 
indicated in what regions further experiments were necessary. 

It has been found that: 

(1) This approach has made it possible to comprehend features of 
the surface which could be exploited to attain further gain when 
possibility of improvement by simpler means had been exhausted. 

(2) By considering the features of the surface for the principal 
response such as yield, or cost, in relation to the features of the surface 
for ‘auxiliary’ responses such as purity, it has been possible to discover 


*Where more accurate graduation was required (as for example in the work on pulse columns 
performed for the Atomic Energy Commission) an equation of third degree has been used [4] [5]. 
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conditions which were ‘best’ in the practical sense of bringing all the 
responses to ‘most satisfactory’ compromise levels. 

(3) Consideration of the shape of a fitted response surface has 
suggested new theories of behaviour of the system. 

It is this last aspeet which we shall here discuss further. 

In the analysis of the fitted second-degree equation the existence of 
a ridge is indicated by one or more of the coefficients in the canonical 
form of the equation being small in comparison with the others. Where 
these small coefficients are negligible it is implied that the system in k 
variables can be more economically described in terms of less than k 
‘anonieal or ‘compound’ variables. It appears that these compound 
variables can have greater significance than a purely representational 
one. In fact they can indicate the fundamental mechanism of the 
system. To make this clear we first consider a simple hypothetical 
example. 

Suppose that the effect on yield of the concentrations ¢, and ¢, of two 
reactants were being studied and that previous experimentation had 
suggested that we should now explore the ranges of concentration: 
c, = 50-60 grams per litre and c, = 30-40 grams per litre which were 
expected to be near their optimum values. It is usually simplest in 
such examples to work with coded values of the variables and we will 
suppose that ‘standardised variables’ were chosen as follows: 


— 55)/5 = (c, — 35)/5 
so that the region explored with suitably placed experiments was 
defined by 
<2, < +l, 
Suppose finally that the fitted second degree equation was 
= 78.56 + 0.50r, — — 2.310) — 2.152; + 4.0822, 
and that the errors of estimate of the coefficients were sufficiently small 
so that the equation was as a whole meaningful. 
Now this, like any other second degree equation, can be written in 
canonical form. (That is by changing the origin and rotating the 


co-ordinate axes we can write it in form containing only quadratic 
terms). In the present case the equation, written in this way becomes 


Y = 78.63 — 4.27N7 — O.19N3 | 
where N, = 0.720, — 0.69x, — 0.06 (1) 
N. = 0.692, + 0.724, — 0.53 | 
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and these last two equations define the positions and directions of the 
new coordinate axes. 

The centre of the system (that is the point Y, = 0, VY, = 0) has co- 
ordinates x, = O41, 7, = 0.34. Thus the axes of the system defined 
by the lines XY, = 0 and XY, = 0 pass close to the original origin and 
through the region in which the experiments have been performed. A 
discussion of the surface in terms of these axes is relevant therefere. 

We notice that very nearly the equation is: 


Y = 78.6 — 4.27(0.7)'(a, — x, — 0.1)? — 0.19(0.7)'(a, + — 0.8)" 
or in terms of the original units 
Y = 78.6 — 0.084(c, — ce, — 20.5)’ — 0.004(¢, + ¢, — 94.0)" 


Thus the canonical variables correspond very nearly, to the difference 
of the concentrations and the sum of the concentrations, the coefficient 
of the difference being much larger than that of the sum. Now re- 
membering that our estimates of the coefficients are subject to error 
and also that the form of equation is probably not entirely adequate, it 
would seem that the data might be explained on the hypothesis that 
yield depended only on the difference between the concentrations and 
not at all on their ‘overall’ level, the best yield being attained whenever 
c, was about 20 grams per litre greater than c,. This hypothesis could 
be readily checked over wider ranges of the variables by further ex- 
periment. 

Assuming this hypothesis was shown to be substantially correct, 
attention which had so far been focussed on the mathematical analysis 
would be shifted to physico-chemical theory. The experimenter would 
ask himself ““What mechanism could produce the phenomenon of yield 
being dependent on this concentration difference?” If he could answer 
that question further experiments would be devised to test his theory. 
Such a theory by contributing to a basic understanding of the mechanism 
of the reaction could, for example, lead to new methods of overcoming 
yield-limiting factors either by modification of the physical conditions 
or by the introduction of other reactants into the system. The fitted 
equation may thus provide not merely an empirical representation of the 
surface near the maximum (useful though this is) but also a valuable 
indication of how the svstem works. 

Now as a consequence of the form of our fitted equation the canonical 
variables Y, and Y, are necessarily expressed as linear functions of the 
quantities x, and z, but it is obvious that usually an underlying ‘com- 
pound variable’ will be some less simple function. For example it will 
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frequently happen that the level of yield will depend on the ratio of two 
concentrations rather than their difference and we shall see later that in 
real examples more complicated relationships occur. The difficulties 
| which this presents are not as serious as they first appear. 

Let us consider the particular example of the yield depending on the 
ratio of the concentrations of two reactants, there being a certain 
optimum ratio. A second degree equation fitted to the logarithms of 
c, and ec, would give an equation similar to that obtained before but 
with a dominant canonical variable (log c, — log c,) instead of (¢, — ¢.) 
and the yield surface plotted in terms of log c, and log c, would contain 
a stationary ridge system. 

Now in practice we should usually be attempting to represent the 
relationship over ranges of concentration which were fairly small 
: compared with the overall magnitudes of the concentrations. The 
appearance of the surface would then not be very different whether it 
was plotted in terms of c, and c, or in terms of log c, and log c, and a 
: ridge system which was represented by equations like (1) would still 
| be found even though the second degree equation were fitted to c, and 
| c, rather than to log c, and log c,. That this is generally true for other 
l types of functions can readily be seen. 

Suppose Y depends only on X = f(c, , c2) and f(c, , c.) can be repre- 
l sented locally reasonably well by the first order terms of its Taylor series 


X = foler , 2) + (Of /e,)o (C1 — Cro) + (Of (C2 — C20) (2) 


where the subscript zero denotes that the value at the point ¢,o , Co is 
taken. Now this equation is of the linear form 


— 


so that if, in the region of the optimum, Y, could be approximated by a 
quadratic function of X, we should have 


Y = Y,+ — X,)’ 


We see therefore that while we should expect that our procedure for 
detecting local factor dependence would have fairly wide applicability, 
the question of what was the relevent form for the compound variable 
would usually be a matter for further speculation and experiment. 

l Returning to ‘he case where yield is dependent on the level of 
2 c,/¢, equation (2) gives for points in the neighbourhood of ¢,5 , ¢2» 


C2 Cx Cio (4) 
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Here therefore the experimenter working with the untransformed 
variables ¢, and ¢, would find a dominant canonical variable NY = ac, + 
be, + din which a,b was roughly equal to ¢.5/¢,) the ratio of the avcrage 
concentrations. This would indicate that proportional changes in the 
concentrations would leave X and hence the yield Y unchanged and 
would lead to recalculation of the equation in terms of log ¢, and log ¢, 
when a closer fit would probably be obtained. 

Usually where the first analysis indicates some ridge system we must 
rely on possible theories of the system to indicate the correct function 
to employ. These theories must of course be checked by further 
experimentation. 

The simple numerical illustration quoted above is hypothetical but 
serves to introduce the following genuine but more complicated example. 


2. THE EMPIRICAL STUDY 


An experimental study of the system concerned has been described in 
reference [1] (first example) and some of the detailed calculations will 
be found in [3].* It was desired to maximise the yield of one of the 
products of a chemical reaction. To do this the yields of this and other 
products of the reaction were experimentally determined under various 
conditions of temperature (7'), concentration of one of the reactants 
(c), and time of reaction (¢). The conditions 7’, ¢ and ¢ were measured 
as degrees centigrade, % concentration and ‘hours on temperature’ 
respectively. 

The results are shown in columns 1, 2, 3, 4, 11, 12, 13 and 14 of 
Table 1. The quantity 4, is the estimated fraction of unchanged starting 
material, 4, the estimated fraction converted to the desired product 
and 4; the estimated fraction occurring as an unwanted by-product. 
The fractions are called the ‘yields’ and are sometimes quoted as 
percentages. The circumflex accent will be used throughout this paper 
to indicate observed or estimated quantities, the ‘true’ values will be 
unaccented. 

For convenience the levels of the variables are coded in columns 
(5), (6) and (7) of Table 1 as follows: 


z, = (T — 167)/5, Zo = (c — 27.5)/2.5, x, = (t — 6.5)/1.5 


The coded values for the first eight experiments are then all at the 
levels +1 and —1 forming a 2° factorial design. When this is augmented 
with experiments 9-15 a ‘central composite’ experimental design [2] 


*The ‘natural units’ given in [3] differ slightly from those quoted in reference [1]. The yields given 
in Table 1 of this paper differ slightly from those in [3] due to refinements previously ignored. 
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[1] is formed suitable for fitting an equation of second degree to any 
observed response. 

A second degree equation fitted to the yields of desired product 43 
for experiments 1-15 indicated a possible planar ridge system. Further 
experiments 16-19 were carried out therefore on the estimated maximum 
plane. In spite of the great differences in the actual conditions employed 
(see table 1) these gave yields close to the maximum value of about 
60% in accordance with prediction. These observations were now 
included in the calculation and the best fitting second degree equation 
using all 19 observations was then: 


Y = 58.78 + 1.90x, + 0.972. + 1.067, — 1.8827 — 0.6923 
— 0.9523 — — — 1.24222; (5) 


where Y denotes the % yield predicted by the equation. 

From Table 2 it will be seen that the sum of squares due to regression 
accounted for 92.6% of the total variation after elimination of the 
mean. From the residual sum of squares an estimate ¢ = 1.81 of the 
experimental error standard deviation was obtained (this may be 
biased upwards due to some inadequacy of the assumed form of the 
equation). Using this estimate for o the standard errors of the co- 
efficients in equation (5) could be calculated. For linear, quadratic 
and interaction terms these were all between 0.3 and 0.5. It appeared 
therefore that equation (5) was reasonably well determined. It was 
found to have the canonical form 


Y — 59.15 = —3.40X; — 0.32X; + 0.20X; (6) 
where X, = 0.7512, + 0.4792, + 0.4552, — 0.349 (7) 
X, = 0.3082, + 0.35627, — 0.8822, + 0.013 (7a) 
X; = 0.5842, — 0.803z, — 0.120x; + 0.485 (7b) 


The centre of the system (that is the point X, = 0, X, = 0, X; = 0) 
had coordination z, = —0.03, z. = 0.55, 2; = 0.23. Thus the axes of 
the system passed through the region in which the experiments had been 
performed and a discussion in terms of these axes was therefore im- 
mediately relevant.* 


*In the case of ridges, especially rising ridges, the ‘centre’ of the canonical system may be found 
almost anywhere on the line or plane of the crest of the ridge. When this centre is remote from the 
region of experiments, we cannot, of course, draw any conclusions about the nature of the surface at 
this remote point. However the preliminary use of the steepest ascent procedure will have ensured 
that the experimenter has already been brought to a point which is close to the ridge. The canonical 
equation can therefore be rewritten in terms of a new o1 “in cluse to the original origin and on the line or 
plane of the ridge. From this form of the equation the principal features of the response surface in the 
immediate region of the experiments may be readily comprehended (see for example reference [1] pp. 37 
and 53 and reference [3] p. 531. 
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Now the canonical variables have the same scale as the original 
variables. That is to say in solid models (like those in Figures 3 and 4) 
in which unit change in 2, , x, , or x; is represented by the same distance, 
unit change in X, , X, and X;, is also represented by this same distance. 
Consequently the relative magnitudes of the coefficients in the canonical 
equation indicate the relative importance of the canonical variables in 
describing the function over the experimental region which has been 
chosen as appropriate for study. The coefficient (—3.40) of X} in 
equation (6) is over 10 times larger than either of the other two co- 
efficients (—0.32 and 0.20). Furthermore the latter are somewhat less 
in magnitude than their errors of estimation. (Appropriate standard 
errors for these constants can be shown to be of the same order of 
magnitude as those of the original quadratic and interaction terms, 
namely about 0.3 to 0.5.) To express the matter a little differently 
within the region in which the fitted equation has some relevance which 
may be roughly defined as —2 < X, < +2, —2 < X, < +2, —2 
< X, < +2, the maximum contribution to the % yield Y of the terms 
in X, is about 14% whilst that of each of the terms in X, and X; is 
only about 1% which is of the same order of magnitude as the ex- 
perimental standard deviation. These facts suggest that the system 
may be described by an equation containing a single canonical variable 
only. 

The refitting ab initio of an equation of the form 


Y — Y(max) = BX’ 
containing only a single variable X which is itself linear in x,, 2. and 2; 


is possible but laborious. An approximation used in [1], [2] and [3] may 


be obtained simply by ignon..g the smaller coefficients in equation (6) 
when we have 


Y — 59.15 = —3.40X? 


with X, defined as before (equation (7)). 


A closer approximation is obtained by fitting by least squares the 
expression 


Y=A+BZ+CZ7 (8) 


where Z is the linear aggregate az, + br, + cx; obtained by omitting 
the constant term in X,. In the present example 


Y = 59.50 + 2.65Z — 3.802’ (9) 
where Z = 0.7512, + 0.47927, + 0.4552, (9a) 
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That is Y — 59.96 = —3.80X; (10) 
where X; = 0.751z, + 0.47927, + 0.4552, — 0.348 (11) 


or in terms of the original variables 


X; = 0.1507 + 0.191e + 0.303t — 31.969 (11a) 


An analysis of variance is shown in table 2. 


TABLE 2 
Analysis of variance table for fitted equations 
Degrees of Source Sum of squares 
Freedom 

357.4 

Full 2nd (equation 9) - 

9 degree equation 371.4 

(equation 5) Remainder 14.0 
9 Residual 29.5 
18 Total after eliminating mean 400.9 


The sum of squares due to the ‘full second degree equation’ (equation 
5) has nine degrees of freedom associated with the nine independent 
constants fitted in addition to the mean. The ‘canonical equation’ 
(equation 9) contains only four independent constants apart from the 
mean. These are two constants of equation (9) and two of the co- 
efficients in (9a) (the third is fixed by the requirement that the squares of 
these coefficients sum to unity). We see that the simpler expression is 
associated with a sum of squares of 357.4 and that the fitting of an equa- 
tion containing five more constants accounts only for a further 14.0 of 
the sum of squares. 

The estimates in equations (9) and (9a) are not linear functions of 
the observations and consequently the number of constants in this 
equation and the number of extra constants associated with the re- 
mainder sum of squares cannot be directly associated with degrees of 
freedom in the usual sense. However the analysis serves to show that 
the simpler expression probably accounts for the data as well as does 
the more complicated one. 

If we put Y equal to its maximum value in (10) we get X, = 0 which, 
when substituted in (11) or (11a), gives a plane of alternative conditions 
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on which Y attains its maximum value. In general on substituting a 
lower value for Y in equation (10) we obtain two values for X, equal in 
magnitude but opposite in sign which when entered in (11) give the 
equations of parallel planes of lower yield ‘sandwiching’ the maximum 
plane as illustrated in figure 3. Sections of this system for three levels 
of the concentration variable are shown on the right hand side of this 
figure. 

At the time when these experiments were carried out (some five 
years ago) the number of chemical yield surfaces which had been 
approximately determined was small and this surface, showing such 
marked dependence between all the variables, was somewhat unexpected. 
Further experiments having confirmed the reality of the system it was 
realised that this was a case where the reaction was sufficiently simple 
to allow a theoretical study which, as it turned out, explained the type 
of surface found. 

Although most chemical systems are more complicated, the study 
serves to show that, as a result of the laws which govern chemical 
systems, ridge surfaces of one sort or another are to be generally ex- 
pected (as experience has in fact confirmed). 


3. THE THEORETICAL STUDY 


The chemical system could be represented by the following sequence 
of competitive reactions 


2a + bNb— aNb + ab (reaction 1) 
2a + aNb — aNa + ab (reaction 2) 


The substance bNb contained a large molecular nucleus N to which 
were attached the two groupings b. In the part of the sequence denoted 
by ‘reaction 1’, one of the b groupings was replaced by an a to form 
aNb which was the required product. However, as is shown in ‘reaction 
2’, under the conditions in which the first reaction could take place aNb 
could destroy itself by combining with more a to form the unwanted 
product aNa. 

The concentration of the starting material bNb was kept constant 
throughout the experiments, the concentration which was varied 
being that of the substance a. The substance ab was chemically inert 
and no reverse reaction took place. 

If the reactions were allowed to continue for a time ¢ a mixture of 
unchanged a and bNb and of the products aNb, aNa and ab was produced. 
The required product aNb had to be separated from the other products 
in subsequent purification stages. 
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To proceed we need to use some simple chemical ideas. As they 
may be unfainiliar to those readers who are not chemists we shall 
briefly explain them as they are needed. 


Concentrations of reactants. 


The chemical equations above indicate the proportions in which the 
actual molecules combine. It is convenient to measure the amounts of 
the various substances in ‘gram moles’, one gram mole being the 
molecular weight in grams of the substance concerned. Thus the equa- 
tion for reaction (1) implies that two gram moles of a combine with 
one gram mole of bNb to form one gram mole of aNb and one gram mole 
of ab. We shall be concerned with the concentrations of the reactants 
in the solvent and these will be expressed as ‘gram moles per litre’. 
The following symbols are used to denote the concentrations and frac- 
tional yields of the various reactants in the system. 


“Fractional” yield 
Reactant Cone. Cone. at time ¢ 
at time ¢ Initially = (concentration) /c2 
Starting a C1 Cio m 
materials bNb Ce C20 Ne 
aNb C3 0 73 
Products ab Cs 0 ns 
aNa Cs 0 ns 


In addition the symbol C denotes the ratio ¢,o/c.) of the initial 
concentrations of the starting materials. In the discussion of section 
2 and in [1] and [3] the concentration of a was measured as a ‘percentage’ 
which was denoted by c. Since the concentration c.. was kept constant 
in our experiment c and C are alternative measures of the concentration 
of a. In fact C = 0.4555c. 

We can now derive a theoretical expression for the yield 7, of aNb 
in terms of 7 the temperature, C the relative concentration of a, and ¢ 
the time, which can be directly compared with the empirical equation. 

Since the number of gram moles of a present in the system in some 
form or other at any time ¢ must be constant and equal to c,, it follows 
that 


+3 +, + = Cro (12) 
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Applying the same reasoning to the nucleus N we have 


C2 + Cy + C5 = Con (13) 

whence nm + a3 + 15 = 1 (14) 
Finally 

+ 2c, = (15) 


By subtracting four times equation (13) from equation (12) and adding 
equation (15) we obtain 


Cr = Cio — + + 4c, 
or m = C — 4+ 2n, + 4m (16) 


Kinetic Theory. 


We now use two simple concepts from the kinetic theory of chemical 
reactions. The law of ‘mass action’ states that, in dilute solution, the 
speed of a chemical reaction is proportional to the molecular con- 
centrations of the substances reacting. Thus in particular if, at time ¢, 
p and q are the molecular concentrations of two substances P and Q 
which are taking part in a non-reversible reaction to form a substance 
R, such that one molecule of P reacts with one molecule of Q to form 
one molecule of R, then the rate of reaction (that is the rate of decrease 
of the molecular concentrations of P or Q or the rate of increase of the 
molecular concentration of R) is given by 


(17) 


where r is the molecular concentration of R at time ¢. Experimental 
results have shown that the law often holds approximately in moderately 
concentrated solutions such as we consider. . 

The quantity k occuring in equation (17) is called the rate constant. 
Its magnitude depends on the temperature (7'). Study of a variety of 
chemical reactions has shown that the relationship between k and T is 


represented fairly satisfactorily by the empirical equation due to 
Arrhenius 


k = aexp {—B/(T + 273)} (18) 


where a and @ are constants depending on the reaction studied. 


i 
¥ 
n 
b 
le 
2) | 


300 BIOMETRICS, SEPTEMBER 1955 


In the sequence with which we are concerned we denote the rate con- 
stant for the first reaction by k’ and the rate constant for the second 
reaction by k and define constants a’, 6’, a and 8 so that 


k’ = a’ exp {—6’/(T + 273)} = a exp {—8/(T + 273)} (19) 
Then the ratio p = k’/k of the rate constants is given by 
p = y exp {—6/(T + 273)} (20) 
where y=a'/a and (21) 
The equation of the ‘theoretical’ surface: 

Now reaction (1) occurred as follows: 

a + bNb+>aNb +56 
a+b—>ab 


the second part of the reaction being instantaneous. 
Similarly for reaction (2) 


a+ aNb—>aNa+ b 
a+b—->ab- 


At some particular temperature 7 then, the rate of disappearance of 
bNb in reaction (1) is pkc,c, . Also the rate of formation of aNb from 
reaction (1) is pkc,c, while the rate at which it is destroyed in reaction 


(2) is keye, . 
We have therefore the pair of differential equations 
—de,/dt = pkey, (22) 
de,/dt = pkey, — (23) 


These together with equations (14), (15), (16) and (19) allow us to 
obtain expressions for 9; , 72 , 73 , 7, and 9; , the yields of the products 
at time t. The derivation is given in section 1 of the appendix. In the 
particular case of the desired product aNb the yield at time ¢ is 


where z, is a function of 7, C, ¢ and c.. depending on the constants a, 
B, y and 6 and defined by 


Caota exp {—8/(T + 273)} 


f 
a 
t 
0 
& t 
p 
a 
C 
th 
la 
i, 


4) 


a, 


RESPONSE SURFACES 301 


where p = y exp {—6/(7' + 278)} (25a) 


To the extent that the various assumptions are justified therefore we 
have an equation for the theoretical yield surface for n; and (see equa- 
tions 74, 76, 77 and 78 of the appendix) incidentally for 7, , nz , n, and 7; 
also. In the form in which it is expressed by equations (24) and (25), 
the characteristics of this surface cannot be readily appreciated so we 
shall proceed by actually fitting this form of expression to our data 
and comparing the resulting surface with that obtained empirically. 
To do this we need to estimate the values of the unknown constants 
from the data. 


The ratio p of the rate constants. 


Considering again the equations describing the reactions we see 
that the ratio p of the rate constants is the ratio of the probability that 
an a will replace a b from bNb to the probability that an a will replace 
a b from aNb. 

Now if the chance of an a replacing a b at a particular position on 
the nucleus N is independent of the type of grouping (a or b) which 
occupies the other position, then the chance of replacement will be 
twice as great with bNb, where there are two positions at which re- 
placement can occur, as with aNb, where there is only one. Thus p 
will equal 2 independently of the temperature 7’. 

Now it is readily shown (see section 2 of the appendix) that the 
maximum yield for aNb is given by 


na(max) (26) 


at which value 


—p/(p-1) 


= p and = p ere (27) 


Consequently if p = 2 then 7; (max) = 0.5, and at this maximum value 
of , = 0.25 and = 0.25. 

The maximum yield actually found was not 50% but about 60%. 
We must conclude therefore that the probability of replacement of a 
particular grouping on a half-substituted molecule is not the same as 
the probability of replacement. on an unsubstituted molecule (a con- 
clusion we should in any case expect from chemical considerations). 
We might however still expect the ratio of these probabilities to be 
largely independent of temperature, at least over the range we have 
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considered. This implies that the temperature constants 6’ and @ in 
equations (19) would be equal and that in equation (20) p = y and 
6 = 0. If we substitute the value 7;(max) = 0.60 in (26) we obtain 
p = 3.4 and the theoretical distribution of yield between the products 
bNb, aNb and aNa when 7»; was at its maximum value would then be 


177 
172 
167 
62 
160 
1S? 
20P THEORETICAL CURVE 3:4 —— 
THEORETICAL CURVE = 2-0 ----- 


1 OBSERVEO VALUES 


10 i iL 
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FIGURE 1. YIELD OF aNb (y:3) PLOTTED AGAINST YIELD OF 6bNb (m) WITH 
THEORETICAL CURVES FOR p = 3.4 AND p = 2. 


m2 = 0.176, ns = 0.600, n; = 0.224. These values agree quite well 
with those found in the final four experiments ‘on the maximum plane’ 
of the empirical surface for which the averages were 


ne = 0.162 nm; = 0.611 ns = 0.213 


That the value p = 3.4, is reasonably consistent with the data in other 
respects can be seen from figure 1 where the observed value 4; is plotted 
against 4#.. The theoretical relationship between 7; and 7, is 


1/p 


— me} (28) 


p 
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Most of the experimental observations are in fairly close agreement 
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with the theoretical curve for p = 3.4 although there seems to be some 
departure for high values of n, . Also there seems to be no evidence 
that the points corresponding to different temperatures follow different 
lines whose general form is changing steadily with temperature. 

The characteristics of our solution are not very sensitive to the 
value of p chosen and we proceed by supposing that p is equal to 3.4 and 
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FIGURE 2. GRAPHS SHOWING VALUES OF THE INTEGRAL w FOR VARIOUS VALUES 
OF z, AND c. 
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is independent of temperature. This implies that, the first substitution 
having occurred, the probability of the second substitution is reduced 
by a factor of 1.7. 

Putting the value p = 3.4 in (25) we have for our theoretical equa- 
tion 


exp {—B/(T + 273)} 
= 24 [ 2°'{2.82"* + 6.82 + 2.4(C — 4)}"'dz (29) 


Jstimates of a and B 


The value of the expression on the right-hand side of the equation 
(29) which we shall denote by w(z, , C) or more simply by w depends on 
z, and C alone or (remembering that the percentage concentration c 
which we have considered in our experiments is equal to C/0.4555) on 
z,andcalone. The integral cannot be expressed in terms of elementary 
functions but its value, for any level of z, and for the values of c we 
have employed in our experiments, can be read off from the graphs in 
figure 2. Each of these graphs was obtained by setting c equal to the 
appropriate value, calculating ordinates of the curve 


y = 2-'{2.82"* + 6.82 + 2.4(C — (30) 


at the 7 equally spaced values z = 1.000, 0.875, 0.750, 0.625, 0.500, 
0.375, 0.250 and then calculating the area between z = 1 and z = z, 
under the curve by numerical quadrature. 

The values of the constants a and 8 can now be estimated from the 
data as follows. Taking natural logarithms equation (29) may be 
written in the form 


Ina — B/(T + 273) (31) 
where u = Inw — Inc» — Int (32) 


For each experiment the value 2, (shown in column 15 of table 1) can be 
estimated from the formula* 


2, = f2 + fis(p 1)/p (33) 


*We notice that in every case the total of #2, 4: and 4s in table 1 is less than the theoretical value of 
100%. This is due partly to difficulties of accurate determination of aNa in the presence of other sub- 
stituents and partly due to some degradation of this product. Because of uncertainty concerning the 
estimate 9s, 2: was calculated from 42 and 43; alone. In references [1] and [3] an empirical surface for aNa 
was fitted and a region was shown in the maximal plane of aNb where less than 20% of aNa was obtained. 
In the theoretical equations the yields of both products depend only on z: and consequently for any sur- 
face for which the yield 93 of aNb was constant the yield ns of by-product aNa would be constant also. 
The region found in the empirical study where aNa was less than 20% probably occurred because degra- 
dation of this product was favoured by reaction conditions in this ian The effect of this 
degradation is not allowed for in the theoretical study. 
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or putting p = 3.4 
2, = fle + 0.7064; (34) 


The corresponding values of w for each experiment may now be obtained 
from the values of z, by reading from the appropriate graph in figure 2. 
Finally the values of @ shown in column 16 of table 1 may be obtained 
by substituting values of ¢ and 2 for each experiment in (32) remembering 
that the value for c.)5 was kept constant at 3.10. 

From the form of equation (31) we see that we may now obtain 
estimates of In a and B by fitting a regression line of @ on x = (T' + 273)7* 
by the method of least squares. 

We find 


16.86 + 3.62 (35) 


Ina 


8 = 10,091 + 1,595 (36) 
The quantities following the plus and minus signs in (35) and (36) 
are the formal ‘standard errors’ calculated in the usual way from the 
residual sum of squares. It is clear from inspection of the table that 
the deviations from the regression line contributing to this sum of 
squares still contain components due to ¢ and ¢ indicating that the 
theoretical expression does not give a perfect fit to the data. This is 
not surprising first because of the assumptions we have had to make in 
the derivation and second because the levels assumed for time ¢ and 
concentration ¢ are not entirely appropriate. Doubt concerning the 
level of ¢ exists because, in addition to the reaction occuring during the 
‘time on temperature’, some reaction will also occur while the reaction 
vessel is being heated up and this is difficult to allow for. The value of 
c may not be entirely appropriate because owing to solubility factors 
the effective concentrations in the solvent may be slightly different at 
different temperatures and at different stages of the reaction. In spite 
of these limitations a reasonably close representation of the experimental 
data is achieved by the theoretical expression (29) which contains only 
three adjustable parameters (p, a and 8) as compared with the ten 
adjustable parameters (85 , 8, , --+ , B23) of the empirical expression. 


Comparison of the theoretical and empirical surfaces. 


Using our estimates, @ and 8, we may now calculate the value of z, 
and hence the values of 72 , 7; and , predicted by our ‘theoretical’ 
equation for any desired level of 7’, c and ¢. Figure 4 shows the contours 
of the resulting ‘theoretical surface’ for comparison with Fig. 3. 
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FIGURE 3. CONTOURS OF EMPIRICAL YIELD SURFACE WITH SECTIONS AT THREE 
a. LEVELS OF CONCENTRATION 
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FIGURE4. CONTOURS OF THEORETICAL YIELD SURFACE WITH SECTIONS AT THREE 
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It is seen that there is remarkably close agreement between the 
general characteristics of the two surfaces one of which has been obtained 
entirely empirically and the other derived on a particular theory of the 
mechanism of the reaction. In each case there is a whole surface rather 
than a unique point on which the maximum yield is obtained surrounded 
on either side by surfaces of lower yield. 

A final point of some interest is that z, and hence the yield 7; is in 
fact a function of four quantities 7, C, t and c.. which determines the 
‘overall concentration’ of the reactants. Of these only the first three 
were varied in the experiments, and c,. was kept constant. Had we 
included ¢y>) as a variable which would have been a perfectly sensible 
thing to do, then the yield surface would still have been completely 
described in terms of a single canonical variable and there would have 
been a redundancy of three variables instead of two. 


An Analogy. 


A simple picture* of what is occurring can perhaps be gained from the 
following analogy. 

Imagine a billiard table on which a number of black and white balls 
are in continuous motion. Suppose that when a black ball collides with 
a white ball a blue ball is produced and that when a black ball collides 
with a newly formed blue ball a red ball results. 

In this analogy black and white balls correspond to the molecules 
of the starting materials a and bNb, blue balls to molecules of the required 
product aNb, and red balls to the unwanted aNa. Starting off with a 
given number of black and white balls we can see that as soon as the 
system is set in motion blue balls will begin to appear and these in turn 
give rise to an increasing number of red balls. Provided that initially 
there is a sufficient excess of black balls the number of blue balls will 
increase to a maximum then fall off until finally only red balls and excess 
black balls remain. 

Clearly the proportions of the various sorts of balls on the table at a 
given instant will depend on the following variables: 


(1) The time which has elasped since the start. 

(2) The speed with which the balls move. 

(3) The relative numbers of black to white balls at the start. 
(4) The total number of white balls at the start. 


*This is intended to provide only a very rough parallel which may be of assistance to non-chemist 
readers. The real mechanism of chemical reactions is known to be much more complicated and cannot 
be accounted for by simple collision. For example only a small proportion of molecular ‘collisions’ actu- 
ally result in reaction and this proportion is dependent on temperature. 


| 
. 
| 
{ 
i 


308 BIOMETRICS, SEPTEMBER 1955 


These correspond to the variables ¢, 7, C and the overall concentration 
C29 (temperature 7’ being linked to speed by the Ahhrenius equation). 
Suppose for fixed conditions of (2) (3) and (4) the time is noted for the 
maximum proportion of blue balls to be produced. If now conditions 
(3) and (4) are kept the same but the speed with which each ball moves 
is doubled the effect will be like that of showing a cinematograph film 
at twice the rate, an identical sequence of events will be gone through 
‘twice as fast and in particular the same maximum proportion of blue 
balls to the initial number of white balls will be produced but in half 
the time. Similarly if conditions (2) and (3) are kept constant but the 
initial number of black and white balls on the table is doubled and if we 
ignore the effect of interference then again a similar sequence of events 
will occur but at twice the speed and again the same maximum propor- 
tion of blue balls to the initial number of white balls will be produced 
but in half the time. . 

The effect of change in factor (3), the relative number of black and 
white balls is less easily appreciated intuitively. However we can see 
that since the relative rates at which white balls are disappearing and 
red and blue balls appearing is at any stage completely independent of 
the number of black balls (since any change in the number of black 
balls effects both these rates proportionally) it follows that the pro- 
portion of blue balls relative to white balls and the proportion of red 
balls relative to white balls follows precisely the same course whatever 
the number of black balls. Consequently once again the same maximum 
proportion of blue balls to white is produced whatever the proportion of 
black to white balls. It is evident that for such a system a maximum can 
be obtained for almost any level of a particular variable provided the 
other three variables are suitably adjusted. 


4, THE CANONICAL VARIABLE 


The part played by z, in the theoretical equation is exactly parallel 
to that played by the canonical variable X, in the empirical equation. 
This is seen most clearly if we consider a case where z, may be obtained 
as an explicit function of 7, c, and ¢. That is to say a case where the 
integral w in equation (25) may be explicitly evaluated. We have 
noted already that on the simplest view of the reaction we would expect 
the ratio p of the rate constants to equal 2, and it is readily seen from the 
form of equations (24) and (25) that, apart from the maximum yield 
of n; being 50% rather than 60%, the general characteristics of the 
resulting surface will be the same with this value as they are with the 
value p = 3.4. Taking p = 2 we have 
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where (see section 3 of the appendix) z, is now explicitly defined in terms 
of 7’, c and t by the equation 


2, = (C — 4)/{C exp — 4)ta exp {—8/(T + 273)}]— 4} 


Now subtracting 7, (max) = 0.50 from both sides of (37), and writing 
Y for 100 », (to agree with the notation of the empirical surface) we 
have 


Y — Y(max) 


200z,(1 — z,) — 50 (39) 
— {7.07(2z, — 1)}? (40) 


Writing W = 7.07 (2z, —1) we see that the theoretical surface is com- 
pletely described by the pair of equations 


Y — Y(max) = —-W’ (41) 
Now (equation 10) the empirical surface is closely approximated by 
Y — Y(max) = —3.80X/? (43) 
where substituting C = 0.4555c in (lla) we have 
X{ = 0.150T + 0.419C + 0.303¢ — 31.969 (44) 


If we write W = (3.80)!X{ we see that the empirical surface is 
approximately described by the equations 


Y — Y(max) = —W’ (45) 
W = 0.292T + 0.817C + 0.5914 — 62.320 (46) 


which are directly comparable with (41) and (42). 

The ‘theoretical canonical variable’ W is a more complicated function 
of 7, C and ¢ than is the empirical canonical variable W. The latter is 
necessarily a simple linear function of T, C and t, and consequently 
contour surfaces of constant yield in figure 3 are necessarily planes. 
However over the regions considered these planes do provide a reason- 
able approximation to the curved contour surfaces of Figure 4 as (for 
the reason given in Section 1) we might expect them to. 

In the discussion above we have compared the canonical variable 
of the empirical surface with the ‘theoretical canonical variable’ arising 
in the particular case when p = 2. When p = 3.4 a similar situation 
will exist and although »; will not be a quadratic function of z, yet a 
quadratic function will still closely approximate the true curve near 
the maximum. 
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5. TEMPERATURE DEPENDANCE OF p 


In our derivation we have supposed that the ratio p = k’/k of the 
rate constants, was itself independent of temperature. To put it 
another way we have supposed that the temperature constants 6’ and 8 
of equation (19) were equal. This supposition is supported by the data 
over the range of values studied. It is interesting however to consider 
how the surface would be affected if this were not true and this is perhaps 
best done by considering an example. Let us suppose that, at the 


p had increased to 3.4. Substituting these values in equation (20) we 
find that this implies that y = 12.63 and 6 = 5.132. If we suppose 
that the values for the constants a and 8 were the same as before we 
then have 


In k’ = 29.49 — 15,223/(T + 273) (47) 
In k = 16.86 — 10,091/(T + 273) (48) 
In p = Ink’ — Ink = 12.63 — 5,132/(T + 273) (49) 


The solid contour model for the (yield: temperature, concentration, 
time) surface which would then be found is shown in Figure 5 together 
with sections taken at various levels of concentration. The diagrams 
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FIGURE 5. CONTOURS OF THEORETICAL YIELD SURFACE WHEN p DEPENDS ON 
TEMPERATURE. 
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were prepared by carrying through the numerical integration and 
subsequent calculations as before for each of the concentrations 22.5%, 
27.5%, and 32.5% and for five values of p corresponding to five values of 
the temperature calculated from equation (49) as follows: 


T = 157 162 167 172.5 177 
p= 200 234 262 3.00 # £3.40 


The three concentration sections were then prepared by drawing smooth 
contour lines through the points calculated at these temperatures and 
finally from these sections the representation of the solid model was 
obtained. 

Considering this solid model we see that we now have a situation 
where there is a ‘rising’ ridge instead of a stationary one. The value of 
the yield steadily increases on the ridge as the temperature is increased. 
A section of the solid model for a particular value of time or concen- 
tration gives a two-dimensional rising ridge system running diagonally 
to the axes of the variables like the concentration sections shown. A 
section of the solid model for a particular value of temperature on the 
other hand gives a two dimensional s/ationary ridge system of the type 
considered before. 

In general we must expect a surface of this sort to occur in the 
common case where a competitive system is influenced by a highly 
dependent set of variables and one competitor is favoured by a certain 
direction of movement in the variables. 

In the example we have considered the rising ridge results from 
the first reaction in the sequence being favoured by high temperatures. 
It is easy to imagine other examples of this sort of phenomenon. For 
instance in some systems the rates of competing reactions depend on 
different powers of the concentration terms (see for example reference 
[5]). In these circumstances a rising ridge associated with concentration 
would be expected. 

It is of some interest to consider the behaviour of the empirical 
method when a rising ridge of this sort occurs. The typical situation 
encountered is as follows. Analysis of the fitted second degree equation 
yields a canonical equation 


Y- Ys = B,,X} + + B33X3 (50) 


in which (as with the stationary ridge system discussed in section 2) 
one of the coefficients (say B,,) is negative and comparatively large 
and the other two (B,. and B,;) are small. The centre S of the fitted 
system is remote from the design. To determine the nature of the 
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system in the region where it applies (in the neighborhood of the design) 
a new origin S’, situated as closely as possible to the design centre and 
in the plane X, = 0 is taken. Suppose this new origin is at the point 
X, = X25. and X; = X35. in the plane X, = 0. Then writing X; = 
X, — X25. and X; = X; — X35. for the new coordinates and sub- 
stituting these in equation (50) we have 


Y- Ys: + B,X3 + B3X3(+ + B33X3’) (51) 


.Where B, and B,; measure the slopes of the yield surface on the plane of 
the ridge and will not be negligible if the ridge is non-stationary. If we 
can ignore B,, and B;; as negligible in comparison with B,, , equation 
(51) is that of a system having contour surfaces which are parabolic 
cylinders like that in Figure 11(f) of [1] or Figure 11.8(E) of [3]. 

We see from equation (51) that, if we wish to move in a direction 
so that Y — Yq. is made as large as possible, we should, (since B,, is 
negative) make the contribution of the first term equal to zero by 
keeping X, = 0 (by remaining on the plane of the ridge). Also we 
should proceed so that the contribution of the terms B,X} and B,X3; 
is as large as possible. For movement through a given distance r on the 
plane this will be achieved by following the direction of steepest ascent. 
Thus X; and Xj; should be varied in proportion to B, and B,. We see 
that this movement which would be at right angles to the yield contours 
on the plane of the ridge would in the present example lead in the 
direction of rising temperature. 

In general where a competitive system is affected by a single factor 
like temperature or concentration this type of analysis will be helpful 
in identifying the factor responsible. At the same time it should be 
borne in mind that in the presence of a ridge system unequivocal 
identification by this means is not possible. For instance in the present 
example we see from figure 5 that we could attribute the effect found to 
the joint influence of time and concentration instead of to temperature. 
As always it is necessary to consider evidence of this kind in the light 
of possible theoretical explanations for the phenomenon observed. 


6. CHOICE OF METRICS FOR VARIABLES 


In the experiments described the standardised variables x, , x2 , 2; 
of the design were linearly related to the natural variables 7’, c and t 
as follows z, = (T — 167)/5, x. = (¢ — 27.5)/2.5, 3 = (t — 6.5)/1.5. 
The unit change of a given variable in the design, taken as a percentage 
of the average departure of the variable from its natural origin we may 
call ‘the coefficient of unit change’, U. Remembering that the natural 
origin for temperature is — 273°C, in the present case we have 
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U (Temp.) = 100 & 5/440 = 1.1% 
U (cone.) 100 X 2.5/27.5 = 9.1% 
U (time) = 100 X 1.5/6.5 = 23.1% 


When experiments are to be conducted with the intention of fitting an 
empirical response surface doubt may exist as to whether we should 
relate the standardised variables of the design to the natural variables 
by a linear scale, as was done in this experiment, or by a log scale, or a 
reciprocal scale, or in some other manner. 

When the coefficients of unit change are small the surface plotted 
in terms of transformed variables, like those above, will usually be 
almost the same in appearance as when plotted in terms of the un- 
transformed variables, since the relationships over the ranges studied 
between transformed and untransformed variables will be almost 
linear in this circumstance. Even so by appropriate choice of metrics 
the interpretation of the fitted equation may he greatly simplified as 
will be illustrated in section 7. 


Theoretical Surface in terms of the New Metrics. 


In the present example we have seen (equation 25) the important 
part which is played by the function 


Coot a {exp — B/(T + 273)} (52) 


in describing the yield surfaces. In fact the time ¢, the overall concentra- 
tion C2. , and (if p is independent of temperature) the temperature T 
enter the theoretical equation only through this expression. Its loga- 
rithm is 


In Coo + Int + Ina — B/(T + 273) (52a) 


which is a linear expression in functions of 7’, c.)5 and ¢. If therefore we 
use a reciprocal scale for absolute temperature and logarithmic scales for 
the time and the overall concentration the contours of the ridge system 
will appear as planes in the space of these variables. In figure 6(a) a 
section of the theoretical yield surface already given in Figure 4 is shown 
with time plotted on a log scale and temperature on a reciprocal scale. 

When p is assumed to be temperature dependent, 7' enters the 
expression (25) on the right hand side as well as on the left. However 
as will be seen from figure 6(b), over the ranges considered, the ridge 
is again rendered almost straight by plotting on the basis of reciprocal 
absolute temperature and log time. From these diagrams it will be 
seen that, when the variables are scaled in terms of these new units, a 
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Time 
FIGURE 6(a) FIGURE 6(b) 


CONTOURS OF THEORETICAL YIELD SURFACES WITH TIME PLOTTED ON A LOG 
SCALE AND TEMPERATURE ON A RECIPROCAL SCALE. 


considerably closer fit might be expected from a second degree equation. 

For many other chemical reactions the compound variable in (52) 
will play an equally important part in the response function and in the 
absence of other evidence the choice in empirical investigations of 
reciprocal scales for absolute temperature and log scales for time and 
overall concentration would seem to be indicated. One would expect 
that, on these scales of measurement, the system could be more precisely 
represented by a simple equation. 

The choice, in the present example, of an appropriate metric for C, 
the concentration of reactant a relative to that of reactant bNb, is less 
easily decided. However, if we consider the particular case where 
p = 2 the equation of the surface for z, may be written 


We are particularly interested in the region of the surface where 7; 
takes its maximum value. Here z, = 3. Substituting this value in 
equation (53) we have for the equation of the surface on which 7; is a 
maximum 


—B(T + 273)"' + f(C) + nt + Ina + Incy, = 0 (54) 


where 
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would seem to be an appropriate metric for C. 


Empirical Surface in Terms of the New Metrics. 


The expectation that an equation of second degree might fit the 
data more closely when the variables were expressed in terms of the 
new metrics (7' + 273)7', f(C), and In ¢, is borne out, as is seen from 
the analysis of variance in Table 3. Since, in this example, the co- 
efficients of unit change are not large no dramatic reduction in the 
residual sum of squares is to be expected however. 


TABLE 3 
Analysis of Variance before and after transformation of the variables 


Sums of squares 
Source DF 

Original New 

Metrics Metrics 
Due to Regression 
(after elimination of the mean) 9 371.4 380.2 
Residual 9 29.5 20.7 
Total (after elimination of the mean) 18 400.9 400.9 


In refitting the equation after changes in the metrics use may be 
made of the estimates already obtained in. the following way. When 
recoding the data for the new metrics we need only ensure that the 
coded data are linearly related to the chosen functions. We can there- 
fore arrange matters so that the coded values of the independent 
variables z, , <2 , Z; are “close” to those of the original independent vari- 
ables x, , 22, 23. In the present example this was done by arranging that 
the re-coded levels of the variables for the first eight experiments (the 2° 
factorial part) were —1 and +1 as before. For example, for temperature 
we require a coding z, = a + b (T + 273)~* such that z, = —1 when 
T = 162°C and z, = 1 when T = 172°C. Substituting these values in 
the equation and solving we obtain a = 88, b = 38,715 whence the coding 
used for temperature was 


Z, = 88 — 38,715(T + 273)"' (56) 
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Proceeding in a similar way with the remaining variables we have 
= —27.9299 + 9.9965/(C) (57) 
%; = —7.84868 + 4.25532 In ¢ (58) 


These recoded values are shown in columns (8), (9) and (10) of table 1. 
The differences from the original coding are not very large and we can 
therefore regard the coefficients by , b, , b, etc. already obtained as 
first approximations to the new coefficients b, , b, , b ete. Accurate 
values of b, j b, ; b, etc. were obtained by writing down the normal 
equations (after elimination of the mean*) for the new recoded variables, 
inserting b, , b, , bs; ete. as first approximations and then obtaining 
successively closer and closer approximations by “one at a time’ and 
“steepest ascent” relaxation (see for example [7] and [8]). If the 
elements of the new inverse matrix are required these can be obtained, 
(for example, by Hotelling’s method [9]), using the elements of the 
known inverse from the original coding as the first approximation. 


7. BASIC CONSTANTS AND CANONICAL VARIABLES 
We find for the newly fitted equation 
Y = 59.15 + 2.00z, + 1.014, + 0.67%, — 2.00%; — 0.722% 
— 1.00%; — 2.78%,4. — 2.18%,4; — 1.164245 (59) 


On comparison with equation (5) it will be seen that, as would be 
expected, the coefficients are quite close to those obtained before. 


The canonical form of the equation is also similar. We have 
— 59.51 = —3.50X? — 0.41X? + 0.19X3 (60) 
where X, = 0.760%, + 0.473%, + 0.446%, — 0.329 (61) 


On decoding we now have an expression for the canonical variable Zz. 
in more appropriate functions of the natural variables. 


X, = —29,423(7 + 273)"' + 4.728f(C) + 1.897 In t + 49.839 (62) 


Putting X, = 0 we see that on the new scales the empirically fitted 
equation gives for the plane of maxima 


—29,423(T + 273)"' + 4.728f(C) + 1.897 In ¢ + 49.839 = 0 (63) 


Now we have seen (54) that for p = 2 the theoretical equation of the 


*By fitting the equation in the form of y — = — #1) + In(r2 — F2) + — 72) + 
* , ete. the convergence of the iteration is speeded up. 
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maximum plane is 
+ 273)" + f(C) + nt +Inatinew =0 G64) 


with which (63) may be compared. 

If the theoretical system exactly fitted, the coefficients of f(C) and 
In ¢ would be equal and on dividing equation (63) by this common 
coefficient (63) and (64) would be exactly comparable. Here to enable 
comparison to be made, we divide (63) through by the geometric mean 
2.995 of the coefficients 4.728 and 1.897 of f(C) and In ¢ to obtain 


—9,824(T + + 1.579f(C) + 0.633 Int + 16.641 =0 (65) 


Comparison of (65) and (64) shows that the canonical variable X, is 
carrying as coefficients the constants of the reaction. The value 9,824 
is an estimate of 6 and (since In coo = 3.10) 16.64 — 3.10 = 13.54 is‘an 
estimate of a. (Both are in reasonable agreement with the estimates 
of equations 35 and 36). 

The lack of equality of the coefficients of f(C) and In ¢ does not 
support theoretical expectation. However, this is probably because 
p ~ 2 and also because for reasons given already, the ‘effective reaction 
times’ are greater than the values assumed and the ‘effective relative 
concentrations’ are less. It would be possible to calculate appropriate 
‘correction factors’ which could indicate how our basic theory should be 
modified. We shall not pursue this topic here however. 

This example served to point out to us two interesting possibilities 
which have been borne in mind and developed in later investigations. 
These are: 


1) Where sufficient is known of the nature of the basic mechanism 
(ie. the kinetics in chemical examples) we may proceed to fit a 
surface based on this mechanism rather than on the empirical 
Taylor series. 

2) When we start off with little knowledge of a system careful study 
of the characteristics of a fitted empirical surface, particularly as 
elucidated by canonical analysis, can lead to a conception of the 
probable basic mechanism. A first guess can then be tested and 
improved upon by a process of ‘experimental iteration’. 


Of these (2) is possibly the more important and may have appli- 
cations outside chemistry, for example, the characteristics of the surface 
for a fertilizer trial in agriculture or a nutrition experiment in biology 


might supply important information on the metabolism of the plant or 
animal cell. 
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8. BASIC CONSTANTS AND THE DIRECTION OF STEEPEST ASCENT 


We have seen above that, in the example we have studied, essential 
information concerning the reaction constants was contained in the 
coefficient of the canonical variable X, . 

It is perhaps worth noting also that the direction of steepest ascent 
would also contain much of this information. For we see that if the 
true surface could be represented in terms of a single canonical variable 
so that 


n — n(max) = A(pz, + + + 8)” (66) 


then multiplying out this expression and equating the coefficients to the 
constants By , , , we have 


Bo = max + ds” B, = 2dsp = Bs = 2Asr 
Buy, = Ap” B22 = dq” B33 = dr? (67) 
Biz = 2dApq Bi; = 2dpr 23 = 


Thus for this type of example the constants 6, , 8. , and 8; which define 
the direction of steepest ascent are in fact proportional to the co- 
efficients p, q, r in the canonical variable, as is at once obvious from 
the consideration that the direction of steepest ascent is at right angles 
to the contour plane of maxima in the space of the factors. 

In examples like the present one estimates of 8, , 8. , 6; , and B,., 
8,3; and £.; are available after the first eight experiments. If these are 
such as would support the hypothesis that 


fu. fa (68) 


we may begin to suspect (although we are entitled to do no more) that 
we may be dealing with a system having a single dominant canonical 
variable. 


9. SOME REMARKS ON THE PROCESS OF SCIENTIFIC INVESTIGATION 


The technique of scientific investigation contains two essential 
processes 

a) the devising of experiments suggested by the investigator’s 
appreciation of the situation to date and designed to elucidate it further; 

b) the examination of results of experiments performed to date in 
the light of all background knowledge available, with the object of 
postulating theories susceptible of test in future experimentation. 

The first is essentially a movement from ‘theory’ to ‘experiment’ 
indicated in Figure 7 by an arrow pointing upwards, the second is a 
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movement from ‘experiment’ to ‘theory’ indicated by an arrow pointing 
downwards. 

During a complete investigation these processes of synthesis and 
analysis used in alternation will normally be employed many times 
and, by what we may call ‘experimental iteration’, the investigator 
should be led closer and closer to the truth. 

Most investigations first pass through a ‘speculative’ stage. Here 
statistical methods can rarely be of help but it is nevertheless vital 
that this early work should be done fully and with imagination, other- 
wise later effort may be wasted in detailed investigation of the wrong 


SPECULATIVE STEEPEST EMPIRICAL THEORETICAL 
ASCENT SURFACE SURFACE 
FITTING STUDY 
EXPERIMENT 


KNOWLEOGE 


FIGURE 7. DIAGRAMATIC REPRESENTATION OF PROCESS OF EXPERIMENTAL 
ITERATION. 


basic system. Statistical methods provide efficient tools for investi- 
gating a system whose general nature has been broadly decided. They 
provide no substitute for basic scientific thinking about what the 
system to be investigated should be. It is the duty of a statistician 
to dissuade the experimenter from employing these methods until 


_he has done sufficient preliminary work to decide what basic system he 


should explore more fully and incidentally until he has acquired reason- 
able skill in carrying out experiments with the system. 

To appreciate the interplay of processes (a) and (b) let us imagine 
the beginning of a chemical investigation. At stage 1 in Figure 7, the 
experimenter would have some, perhaps not very precise, idea as to the 
general way in which some chemical might be manufactured. Process 
(a) would begin in his mind something like this—‘“I believe that in 
suitable circumstances reactant A would combine with reactant B 
to form C. From theoretical knowledge, my own experience, and 
other people’s experience of similar reactions cited in the literature, 
I should think that conditions X might be worth trying’’. 

The appropriate experiment would then be performed (stage 2 in 
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the diagram). As soon as the results were seen the second type of 
mental process, denoted by (b) above would start—“The reaction did 
produce a little of the desired product C but there was a very large 
amount of unwanted product D also present. This could be due to the 
large amount of water which had to be used to dissolve the reactants 
and which would favour formation of D”’. 
He has now reached stage (3) at which point the first kind of mental 
- process (a) begins again—“If I carry out the reaction using a non- 
aqueous solvent I may avoid the large production of by-product D”’. 
He is thus led to perform a further experiment at stage (4) using a 
non-aqueous solvent, and so on. 

When the speculative experiments have led to some reasonably 
well defined system which is sufficiently promising to justify develop- 
ment much will be gained by using the powerful tools provided by 
applied mathematics such as “steepest ascent’’, empirical surface 
fitting and “theoretical surface study”’. 

It should be noted that these techniques still employ the basic 
processes (a) and (b), and that our applied mathematics helps as much 
with (b) as it does with (a). Thus we are not only concerned with 
designing experiments which will estimate the ‘effects of the factors’ 
(process a) but also with making calculations (for example of the 
direction of steepest ascent, and of the canonical form of a fitted equa- 
tion) which suggest what further experimentation should be performed 
(process b). 

There has been a tendency for some statisticians to concentrate 
on the, perhaps rather rare, experimental situation where a single 
group of experiments is planned and from the result some irrevocable 
decision is to be made. Such an investigation is concerned exclusively 
with a single application of procedure (a). It is customary to emphasise 
in such a situation the danger of taking action on a hypothesis suggested 
by inspection of the data, but not in mind when the experiment was 
planned. 

This point has sometimes been misunderstood and interpreted to 
mean that process (b) was in some way suspect and should not be 
indulged in. In fact of course it is fundamental to investigation which 
would be quite barren without it. In researches of the type we have 
discussed the experimenter and the statistician should examine the 
data most carefully and any hypothesis which appeared possible and 
important should be submitted to the test of later experimentation. 

The particular sequence of techniques shown in Figure (7) is not to 
be regarded as providing a set pattern which should always be followed. 
The use of these and other devices will be decided by such circumstances 
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as the degree of basic knowledge concerning the mechanism of the 
system and the object and importance of the study. If, for example, 
the experimenter were required merely to make in the laboratory a 
few pounds of some rare organic chemical for some special research 
purpose, he would be quite content to do a few speculative experiments 
sufficient to allow him to prepare this small amount of material in 
reasonable quality, the finding of an economic process would not be 
worth the trouble. At the other end of the scale, for a large and ex- 
pensive manufacture an elaborate and costly study would be justified. 
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APPENDIX A—DERIVATION OF CERTAIN RESULTS QUOTED IN THE PAPER 


1) General solution of the differential equations. 
From equations (22) and (23) we have 


de;/de, = p | (69) 
Write c; = c.s, then 
de;/de, = 8 + c,(ds/dez) (70) 
Substituting (70) in (69) we have 
—dc, ds 
c 1+s8(p—1)/p 
—In + constant = in {1 + — 1)/o} (72) 


Now when ¢t = 0, ¢. = C29 , and s = 0 hence the constant in In cy, . 
Now = and C3/Ce na/N2 


Thus 
p— 1 Ns —p/(p-1) 
+ 1) (73) 
Write % = 2° (74) 


After substituting this expression in (73) and rearranging we have 


w= - (75) 
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and using equation (14) 


From equation (15) 
om p hed 2 
Finally from equation (16) 
_2p_ (po — 2), 
m=C (78) 
Now from equation (22) 
—d 
= Coopknine (79) 


Substituting (78) in (79) and noting that dy,/dt = pz’'dz/dt we have 
after rearrangement 
dt 
dz 
z{2(p — 2)z2” + 2pz + (p — 1)(C — 4)} 


(80) 
whence 
kent = (9 — 1) — Qe’ + + (9 — NC 81) 


and using the Arrhenius equation (18) we obtain equation (25). Equa- 
tions (74), (75), (76), (77) and (78) together with (81) yield the complete 
solution allowing the yields 7, , m2 , 73 , 7 and 7; to be evaluated at 
any time ¢. 


2) Maximum yield of aNb. 
If we put dc;/dt = 0 in equation (23) we have 


13/N2 = p (82) 

substituting in (73) =a (83) 
Whence 

m= pe” (84) 


which is readily shown to be a maximum. 
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3) Solution when p = 2. 
In the particular case when p = 2 equation (81) becomes 


kept = z ‘(42+ C — 4)‘ dz (86) 

giving keyt = {In (42, + C — 4) — In (C2) }/(C — 4) (87) 
After rearrangement this yields the explicit function for z, 

2, = (C — 4)/[C exp {e.(C — 4)tk} — 4] (88) 


whence using the Arrhenius equation (18) we obtain equation (38). 


REFERENCES 


{1] Box, G. E. P. The Exploration and Exploitation of Response Surfaces: Some 
general considerations and examples. Biometrics, 10: 16, 1954. 

[2] Box, G. E. P. and Wilson, K. B. On the Experimental Attainment of Optimum 
Conditions. J. R. Stat. Soc., B, 13: 1, 1951. 

(3) The Design and Analysis of Industrial Experiments, Box, G. E. P., Connor, L. R., 
Cousins, W. R., Davies, O. L. (Editor), Himsworth, F. R., and Sillito, G. P., 
London: Oliver and Boyd, 1954. 

[4] Arnold, D. 8., Box, G. E. P., Erickson, E. E., Hunter, J. S., Nelli, J. R., Pike, 
F. P. The application of Statistical Procedures to the Flooding Capacity of a 
Pulse column, Progress Report No. 3 Contract No. AT-(40-1)-1320. 

[5] Erickson, E. E., Hunter, J.S., Nelli, J. R., Pike, F. P. The flooding capacity of a 
Pulse Column on the Benzene-Water System. Progress Report No. 4. Contract 
No. AT-(40-1)-1320. 

{6] Hammett, L.P. Physical Organic Chemistry, 1st Ed., 1940. New York: McGraw- 
Hill. 

7] Fox, L. A short account of Relaxation Methods. Quart J. Mechanics and Applied 
Mathematics, 1: 253, 1948. 

[8] Booth, A.D. An application of the Method of Steepest Descents to the Solution 
of systems of Non-Linear Simultaneous equations. Quart. J. Mechanics and Ap- 
plied Mathematics, 2: 460, 1949. 

[9] Hotelling, H. Some new methods of Matrix Calculation, Ann. Math. Stat., 14: 1, 

1943. 


| 
t 
) 
) 
) 
5) 


DESIGN AND ANALYSIS OF TWO PHASE EXPERIMENTS 


G. A. McInryre 
Division of Mathematical Statistics, 
C.S.1.R.0., Canberra, Australia 

Introduction 

It sometimes happens in experimental work that the effects of 
different treatments cannot be measured directly and a further stage 
of testing is required in order to evaluate them. Examples of this type 
of situation are studies of the effect of conditions of growth of parent 
material on resistance to disease or productivity of progeny; the survival 
of nodule bacteria under various conditions of storage and appraised 
by inoculating appropriate legume seedlings; and the effect of various 


treatments on virus multiplication in leaf tissue, the concentration of 
virus being ascertained by lesion counts on indicator plants. 


Principle of Design 


In order to have a measure of consistency of performance, due to 
treatment, and a valid basis for a test of significance it is essential that 
there should be replication in the first phase. Further, it is essential 
that the product of each plot of the first phase should be separately 
evaluated in the second phase. 

Replication in the second phase is not necessary but is highly de- 
sirable where uncontrollable variation in this phase is large relative to 
anticipated effects. The comparison of sugar cane varieties in sugar 
content per unit weight involves a replicated variety trial in the first 
phase followed by chemical analysis of the product of each plot, this 
perhaps being done in duplicate or triplicate. Replication in the second 
phase is in this case usually done only as a check against analytical 
mistakes. On the other hand if one wished to determine the effect of 
various spacings between plants in a seed crop on the production per 
acre in the following generation then with anticipated small or no 
effects of spacing one would probably (a) replicate the testing of seed 
from each plot of the first phase and (b) employ a design in the second 
phase which would enable the elimination of major changes of soil 
fertility and still permit a valid analysis of the total yields for seed 
deriving from each first phase plot. 
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design are assayed separately from other strata but the same form of 
design and degree of replication is used for all strata. The design should 
be such that. the error variance for every comparison of pairs within 
all strata is the same. 

For example suppose that six treatments are compared in the first 
phase in a randomised block arrangement with four replicates. The 
material from each replication could be assayed in the second phase 
using 2 6 X 6 latin square. Here squares are confounded with first 
-_phase blocks while rows and columns within squares are orthogonal 
to treatments within blocks so that effects associated with strata in 
neither the first nor second phases contribute to treatment and error 
mean square in the analysis. 

(3) First Phase: Any stratified design 

Second Phase: One or more repetitions of the design with one- 
to-one correspondence of plots in the first phase and assay plots in the 
second phase. Subsequently there would be randomisation of strata 
and plots within strata in the second phase. Thus if the numbered 
plots of a latin square are re-randomised by rows and columns to give 
a design for the second phase, then material from a particular numbered 
plot in the first phase would be tested on the plot of corresponding 
number in the second phase. In this class of design there is complete 
confounding of stratification between the two phases. 

(4) First Phase: 6 X 6 latin square 

Second Phase: Randomised block with 6 plots and 6 treatments. 
The material from the six plots in a column of the latin square is 
assigned at random to plots in a block of the second phase. This is 
a degenerate case of the preceding example. 

It is of interest to note that if the designs of the first and second 
phases were reversed the row stratification of the second phase is not 
confounded with stratification of the first phase nor orthogonal to 
treatments within blocks. Analysed as a randomised block the error 
variance but not the treatment mean square would be inflated by the 
row effects of the second phase and in fact for a valid analysis the data 
would have to be analysed according to the second phase design. 

(5) This example illustrates the possibilities of more elaborate 
arrangements, not desirable in themselves, except where there is a 
potentially worthwhile economy in the use of test material to attain a 
given level of precision. 

The following unpublished data by courtesy of D. J. Goodchild, 
Plant Pathology Laboratories, University of Sydney, relates to an 
investigation of the effect of four light treatments on the synthesis of 
tobacco mosaic virus in leaves of tobacco Nicotiana tabacum var. 
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Hickory Pryor. Four suceessive leaves at defined positions on the 
stem were taken from each of eight plants of comparable age and vigor 
after inoculation with buffered and diluted sap expressed from infected 
tobacco plants. The eight plants were topped and kept under a constant 
and continuous light intensity for 48 hours prior to inoculation. 

Arbitrarily grouping the plants into two sets of four, the four 
treatments were applied to the leaves, which had been separated from 
the plants and were sustained by flotation on distilled water, in a latin 
square design to each set with plant source as columns and leaf positions 
as TOWS. 

After treatment, virus content of each leaf was assayed by ex- 
pressing sap, diluting with phosphate buffer to an appropriate dilution 
and inoculating half leaves of the assay plants, Datura stramonium, 
on which countable lesions appeared. Dilutions from leaves belonging 
to the first column in the latin squares of each set were regarded in 
effect as eight treatments which were assayed in a 4 X 4 graeco-latin 
design using half leaves at four consecutive positions on four assay 
plants, treatments from a column within a set belonging to the same 
alphabet. Similarly for the leaves belonging to the second, third and 
fourth columns of the first phase sets. 

In Table 1(a) the plan of assignment of treatments to plants and 
leaves within plants is given together with a plot number which is used 
to identify the source of virus for the half leaves of Table 1(b).  In- 
cluded also in 1(b) are the square roots of counts which were transformed 


TABLE 1(a) 
First Phase: Two 4 X 4 latin squares 


TEST PLANTS TEST PLANTS 
Leaf Leaf 
Position 1 2 3 4 Position 5 6 7 8 
a b c d a b c d 
1 90.8 116.7 84.9 64.4 1 61.5 69.1 76.2 64.5 
1 5 9 13 17 21 25 29 
b a d c e d a b 
2 66.9 49.7 77.3 72.7 2 89.8 88.1 54.4 55.3 
2 6 10 14 18 22 26 30 
e d a b d e b a 
3 91.2 92.5 755 56.5 4} 101.5 84.7 81.7 64.0 
3 7 ll 15 19 23 27 31 
d c b a b a d € 
4 85.4 91.5 83.2 60.7 4 78.6 78.0 71.6 68.1 
4 8 12 16 20 24 28 32 
a b c d 


Treatment Totals 534.6 608 .0 659.1 645.3 
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TABLE 1(b) 


Second Phase: Four 4X 4 yraeco-latin squares 


ASSAY PLANTS ASSAY PLANTS 
Leaf Leaf 
Position 1 2 3 4 Position 5 6 7 8 
1 24.9) 2 12.7) 3 17.5) 4 16.1! 28.016 11.5] 7 26.4] 8 17.1 
1 1 
17 18.5/20 12.3)18 17.2]19 19.3 23 18.0/22 21.3)24 17.83/21 11.1 
2 24.5] 1 23.1) 4 22.5] 3 25.8 8 25.816 10.415 25.7 
2 2 
18 31.6/19 24.3/17 11.9]20 18.2 22 23.6)23 22.2121 15.1)24 18.7 
3 30.5] 4 22.2) 1 27.7} 2 18.7 7 20.1] 8 32.0} 5 34.916 13.2 
3 3 
19 32.3/18 22.8/20 23.7]17 13.7 21 18.6/24 26.7/22 20.2/23 22.3 
4 24.613 17.4/2 15.0)1 15.1 6 14.6] 5 28.1] 8 20.7] 7 20.1 
4 4 
20 24.4|17 17.4|19 25.6/18 18.2 24 14.821 24.3)23 22.222 23.0 
ASSAY PLANTS ASSAY PLANTS 
Leaf Leaf 
Position 9 10 11 12 Position 13 14 15 16 
9 23.2)10 154/11 12.7/12 17.3) 13, 13.814 18.415 13.1)16 9.4 
1 1 
| 
28 20.325 14.8|27 M 30 92/31 21.9129 12.1192 6.8 
10 24.2} 9 14.7)12  17.2)11 25.5] 16 16.7/14 21.5/13 13.3 
2 2 
27° 25.7/26 11.7/28 15.6|25 31 17.0)30 13.60/32 19.2/29 16.9 
20.12 21.51 9 19.51/10 22 15 15.416 21.1/13 18.1114 17.2 
3 
26 16.1|27  23.2|25 17.1/28 18.7) 32 22.3\29 18.031 13.8)/30 14.6 
12 27.211 16.210 15.7, 9 27.5) 14. 15.6/13 19.2)16 11.3 
4 | 4 
25 26.9/28 17.0126 11.7|27 18.2) 17.5/32 19.830 17.9/31 11.3 


to equalise variance within treatments, and the sums of counts of the 
same identification number are recorded in 1(a). Thus the total for 
the first leaf of Plant 5 in the first phase is 61.5, given by the sum of 
18.5, 11.9, 13.7 and 17.4. 

So far as the leaves of either latin square set are concerned the plots 
within columns are tested in latin squares in the second phase. This is 
a design of the type given in Example (2) and the contribution to latin 
square error and treatment and row mean squares within sets is the 
within alphabet variance of the graeco-latin square. However the 
latin column mean square is inflated in addition by the between graeco- 
latin squares contrast for a single alphabet. 

It is not obvious however that the contribution to the mean square 
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for treatments taken over both alphabets is also the within alphabet 
variance. The contrast between treatments a, b is the contrast of plots 
identified as 1,17 and 2,20 in the first graeco-latin square and so on. 
If A be a measure of deviation from the norm for a particular whole 
leaf and ¢ for a half leaf within a whole leaf then the elements of contrast 
from these sources of variation in the first graeco-latin square are 


(1) (17) (2) (20) 
Anté10 An + Ais + Ata + 
Ass + Asa + Ass + Ass + 
t+ Aaa + Aus + Agr + 


where the subscripts give the row and column position respectively in 
this first square. Similarly for the remaining graeco-latin squares. 
Collecting coefficients of the different elements of variation, squaring, 
adding and dividing by 64, which is the number of contrasted half 
leaves for these treatments in the second phase, the expectation of the 
contribution to variance from these sources is os + o.. The within 
alphabet variance which can be derived from the graeco-latin square 
analyses is an estimator of this. 

As the contribution of variance from the graeco-latin squares to 
treatments is the same within and over the two sets, it follows that 
the interaction of treatments with sets also has this component of 
variance and therefore the pooling of this interaction with the latin 
square error within sets in the conventional analysis of duplicated latin 
squares is an unbiased procedure. The only source of variance in the 
analysis of the sets which does not include a contribution from the 
graeco-latin squares equal to the within alphabet variance is the mean 
square for the set contrast. 

The formal analysis of the two primary sets is given in Table 2. 
Included also is a pooled analysis of the graeco-latin squares, following 
Yates, to provide estimates of second phase components. 

Components of variation contributing to the mean squares are 
listed. They are, in order 


oy — variance between treatment means 

orp — variance between test plants 

orr — Variance between leaf positions in test plants 
oi — residual variance between leaves of test plants 
cap — Variance between assay plants 


i 
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oar — variance between leaf positions in assay plants 
ao, — residual variance between whole leaves of assay plants 
a. — variance between halves within leaves of assay plants 


The above notation is strictly appropriate for random variates. 
With constants for treatments and for positions within test and assay 
plants and with only additive effects, o; , for example, stands for 
(V. — V)*/(4 21). 

_ The treatment mean square is significant at the 5% level. The 
means of treatments in half leaf units are a, 16.71; b, 19.00; c, 20.60; 
d, 20.17 with S. E. of (30.25/32)! or 0.97. 

For the assay plants the mean squares for plants and leaf position 
are significant at the 1% and 5% levels respectively. For the test 
plants the mean square for plants contains a component due to differ- 
ences between assay plants not present in the error mean square. The 
effect of leaf position on test plants is not statistically significant. 

By equating the mean squares to their expectations expressed in 
components, estimates of the variances of these components may be 
determined. Thus the estimate of o; is } (11.80-7.60). For o7p the 
sum of squares for sets and plants within sets were pooled and likewise 
the coefficients of variances of components corresponding to these sums 
of squares. The estimate of 7p was then determined by elimination of 
the contributions from other components using the estimates of these 
from other mean squares. For the majority of estimates the error of 
estimation will be considerable. The estimated values given at the 
foot of Table 2 will however be used in the discussion for illustrative 
purposes. 

Discussion 

Modification of Design 

Provided that there is replication at the second phase so that an 
estimate of the variances associated with the various stratifications 
can be estimated in both phases as in the numerical example, it is 
possible to predict from the data of one or more trials the likely conse- 
quences of modifying the design. The several arrangements which 
follow have been chosen to illustrate the type examples given earlier 
and are correspondingly numbered. In each instance a treatment is 
assayed by 32 half leaves in the second phase so that the variances of 
treatment means are the corresponding error variances in half leaf 
units divided by 32. 

1. First Phase: Two4 X 4 latin squares 

Second Phase: Complete randomisation of four assays from each 
first phase plot within the 128 half leaf positions. 
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2. First Phase: Randomised blocks, 4 treatments within each of 8 
plants 


Second Phase: Treatments within plants of the first phase are 
assayed in random order in each of two plants in the second phase, 
using both halves of a leaf for the same virus source. 

3. First Phase: Two4 X 4 latin squares 

Second Phase: Each first phase latin square is duplicated in the 
second phase with one-to-one correspondence, using both halves of a 
leaf for the same virus source. 

4. First Phase: Two4 X 4 latin squares 

Second Phase: Treatments within plants of the first phase are 
assayed in random order in each of two plants in the second phase, using 
both halves of a leaf for the same virus source. 

The error variances in half leaf units for these modifications are 
expressed in terms of the components and estimates are given by 
substituting the estimates of variance from the numerical analysis. 


Modi- Components of Error Variance Esti- 
fica- mated 


All of these modifications are less efficient than the actual experi- 
mental design. 


Modification of amount of replication 


This is a particular case of modification of design. Factors limiting 
increase in replication in the two phases and relative cost in time and 
labour may be quite different. In the experimental situation discussed 
in Example 5 it is not physically convenient to increase replication in 
the first phase beyond eight plants for any one trial but the effect of 
changes in the amount of replication in the second phase could be 
considered. The error variance for treatments is not changed if the 
design is modified by using adjacent pairs of latin square columns to 
be the alphabets of a graeco-latin square in the second phase. This 
makes possible a comparison between variances of treatment means for 
four, six and eight graeco-latin squares in the second phase. 


| 
Variance 
1 4 120/127 | 96/127 | 126/127 1 39.98 
2 4 4 2 2 1 | 46.64 
3 4 2 1 32.36 
4 4 2 2 1 38.36 


TWO PHASE EXPERIMENTS 


333 


Treatment 
Second | replication Components of Variance 
phase in half 


Estimated 
error 
variance 


Estimated 
variance 
of treat- 

ment means 


4 squares 32 
6 squares 48 
8 squares 64 


0.946 
0.845 
0.794 


There is a psychological hazard in two phase experiments that the 
second phase will not be recognised as only an assay. The position is 
strictly analogous to the difficulty sometimes experienced in realising 
that sampling variation within plots is not of direct relevance to the 
comparison of treatments for which the appropriate error variance is 


the replicate error. 


The point scarcely requires illustration but one could consider the 
consequences of the following procedure with material of the type used 


in the numerical example. 


First Phase: One replication of four treatments using four leaves 


from one plant. 


Second Phase: First phase treatments assayed in random order in 
each of 16 plants of the second phase, using both halves of a leaf for the 


same virus source. 


Failure to replicate treatments in the first phase 


The gain in information by doubling the work in the assay phase is 
only 19%. With duplication of the trial as designed by repetition in 
time there would be a gain of 100%. 


Components of Variance 


Estimated Error Variance 


ignoring first 


phase components 


including first 
phase components 


17.80 


248.52 


The error variance for the comparison of the assayed leaves is 
17.80. The true error variance for the comparison of treatments is 
248.52, but this would not be ascertainable from the results of the 
experimental procedure as no estimate would be available of the average 
effect of leaf position on the test plant and the residual leaf effect. To 
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identify differences between the four leaves with treatment effects only 
and to use the error variance of 17.80 as the basis of comparison would 
be wrong in principle and could lead to quite misleading conclusions 
in practice. 


Summary 


In experiments where the effects of treatments have to be determined 
by a subsequent stage of assaying it is important to consider design in 
the assay phase in relation to the design in the primary phase. Ex- 
amples of designs which enable the assay totals deriving from material 
in first phase plots to be analysed for treatment and error mean square 
according to the first phase design are discussed. 
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KIMBALL, A. W. Short-cut formulas for the exact partition of x’ in 
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THE BRADLEY-TERRY .PROBABILITY MODEL AND 
PREFERENCE TASTING 


N. T. GripGEMAN 
Division of Applied Biology 
National Research Council, 

Ottawa, Canada 


INTRODUCTION 


Broadly speaking, two methods of appraising sensory differences 
may be distinguished: scoring and ranking. There are also two sources 
of sensory difference: that among the intensities of several stimuli 
identical in kind, and that among preferences in a group of stimuli that 
may or may not be of the same kind. This paper is concerned with the 
applicability of the simplest form of ranking, namely, pair comparisons, 
to testing in general and taste preferences in particular. 

In organoleptic work it is usually rewarding to postulate a sensory 
continuum whose points, S, are monotonically related to the concentra- 
tion, C, of a given stimulus in a given medium. Some controversy has 
centred on the meaningfulness of this notion, and on its right to come 
within the ambit of metrology at all; here, without discussion, the view 
will be adopted that the concept is operationally valid, and that the 
practical problem is to refine the measuring technique. A common 
further assumption is that the relation approximates to the Fechnerian 
form S = a + 6 log (C/C,), where a and £ are constants and C;, is the 
threshold detectable concentration, over a certain critical range. We 
shall not at the moment perpend this relation, but may observe that, 
whatever the true equation, its parameters are likely to be biological 
variants over the universe of tasters. 

A preference continuum is a more nebulous concept. In simplest 
form, it may be thought of as a series of points forming an ordinate of 
preference, P, to an abscissa of concentration, C, of a given stimulus; 
and it is not difficult to visualize a curve that maximizes P at some 
particular concentration. But a preference continuum must also be 
applicable to a series of stimuli that are at least partially different in 
kind. This necessitates a more complex model in which P is a function 
of an n-dimensional vector quantity. Can such a continuum be vali- 
dated? In other words, can a subject’s preference statements, in suit- 
ably selected and controlled situations, concerning a set of flavors, be 
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taken to stem from graded reactions of delectability? Is relative delect- 
ability quantifiable? These questions are important insofar as prefer- 
ence itself is technologically important; it is in fact concerned with taste 
in the everyday usage of the word. 

To arrange a given set of flavors in some sort of order whose statistical 
significance can be assessed, we can restrict attention to pair compari- 
sons, the pairs forming incomplete blocks of two. This method has 
the attraction of breaking the test down into the simplest possible 
decision units, and of not requiring graded responses from the taster. 
To make best use of the data we need a model of the relation between 
the probability of individual pair judgements and points on the postu- 
lated continuum. Such a model has been put forward by Bradley and 
Terry (1, 2). Its applicability to tasting for relative sweetness of graded 
concentrations of sugar has already been demonstrated in this laboratory 
by Hopkins (3). Part of his experimental work was on preferences 
among primary-flavor mixtures, and these too seemed to fit the model. 
This work has now been extended to cover a wider flavor range. By 
way of introduction, a sketch of the Bradley-Terry model (see, also, 
Thurstone, (4)) may help explain the design and interpretation of the 
experiment. 


THE BRADLEY-TERRY MODEL 


The probability of a taster’s judging that sample 7 contains more of 
the stimulus than sample j will depend on the sensation difference 
S,; — S; measured on the continuum. (Merely for expository conveni- 
ence, we shall argue in terms of stimulus intensity.) If this quantity is 
large and negative the probability will be small; if zero, the probability 
will be 0.5; and if large and positive the probability will be close to unity. 
For a particular pair of stimulus concentrations C; and C; , taster-to- 
taster differences in the probability of the specified judgement will be 
a reflection of ‘non-parallelism’”’ among the underlying curves of 
S = f(C), i.e., the tasters’ discriminatory powers in that sensory region 
must be heterogeneous. 

As a first approach, let us assume that the probability density, over 
all values of S; — S, for a particular taster, is normal, so that 


However, at the expense of introducing a trivial departure from nor- 
mality, a simple algebraic model of the mechanism can be set up. The 
point of departure is the observation that a certain “squared hyperbolic 
secant” function, which is slightly leptokurtic to the normal function 
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(and whose integration yields the logistic curve*), has these character- 
istics (see Fig. 1): 


A 8 
y 


FIG. 1. THE SQUARED-ITYPERBOLIC-SECANT FUNCTION. THE MAXIMUM ORDI- 
NATE IS 4%; THE AREA ENCLOSED IS UNITY. 


Hence y = In (A/B). If we now put 


A = anti-In S; = F; 
and B = anti-In S; = 7; 
we immediately arrive at the position that 
Pr {t > j} = + 73) (1) 


and we conceive of z as a function of S—although an ad hoc function 
in that the two z’s sum to unity. We may at once generalize to the case 
of ¢ samples (stimulus concentrations) for which 7, + 7, + 7; + --- 
+ m, = 1, and for any pair of which equation (1) applies. There are 
i(t — 1)/2 pairs and, therefore, that many probability estimates (as 
frequencies of specified judgements in replicate pair comparisons) from 
which the z’s can be estimated. 

These x’s, termed “ratings” (although they are of no practical 
importance as metrical ratings), have two uses: firstly, they provide a 
means of checking the model; secondly, significance tests can be derived 
from them. Maximum likelihood estimates of x are calculated from 
the ¢(¢ — 1)/2 frequency estimates f;;/n of the probabilities Pr {i > 7}, 
the equations (¢ in number) being 


where f,; is the frequency of selection (in a specified direction e.g., 


*Thus the distribution difference is exactly that underlying the logit—probit issue (or, to keep 
abreast of current Berksonian terminology, the logit-normit issue). 
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“the stronger flavored”) of 7, summed over all pair comparisons. As, 
by definition, _ x, is unity, the equations can be solved (by iteration) 
for the unknowns. 

A given set of ¢ frequency sums may arise from any one of many 
different combinations of the t(t — 1)/2 individual frequencies, com- 
binations that will differ in fit to the requirements of the rating model. 
From the model we can estimate the expected ratings, and hence the 
expectations for the individual frequencies. Finally, from the observed 
and expected cell frequencies (half each of f;; and f;;), Bradley has 
shown how to evaluate an index of discrepancy that closely approxi- 
mates x’ with ¢(¢ — 1)(t — 2)/2 degrees of freedom. 

The use of # to test for significant sensory differences among the 
samples stems from the following considerations: The simplest alter- 
native hypotheses are: equality (H,) versus general but undetailed 
inequality (H,). If H, is true, the probability of each of the possible 
sets of ¢ frequency sums, as a chance o¢currence, can be derived from 
first principles, although as ¢ and n increase the computational work 
soon becomes burdensome. All the sets can be listed in such order that 
their cumulative probabilities provide significance levels for the accept- 
ability of the null hypothesis, an order determined by the likelihood- 
ratio test statistic 

=n log + — Li log 


whose minimum is zero and whose maximum is associated with the unit 
end-point of the probability accumulation. Hence the cumulative 
probability corresponding to any particular B, is that of its not being 
exceeded if H, is true. 

Bradley and Terry have tabulated all x, B, and the cumulative 
probabilities for ¢ = 3; = 1(1)10, and fort = 4;n = 1(1)6. These 
tables are not conspicuously easy to enter as they stand, a defect easily 
remedied however by the construction of a new (first) column of squari- 
ances (sums of squares of deviations from the mean f;) to serve as a 
pathfinder. When n is large, say > 2", Bradley and Terry recommend 
the evaluation of B, and the likelihood ratio as usual; then, since the 
colog of the latter may be regarded rs x’/2, significances are ascertain- 
able without recourse to special tables. As n increases, the probability 
that a set of observed frequency sums could arise fortuitously from 
the null hypothesis ever more cl.sely approximates a smooth mono- 
tonic function of the squariances. In other words, x’-type curves seem 
to be approached independently of B, and therefore of the ratings and 
the model. Thus, in the limit, the fit of the model is immaterial to 
inter-sample significance testing, which can be done directly with 
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fi/n — nt(t — 1)°/4 


regarded as a statistic distributed as x’. 

On the other hand, the larger the scale of the experiment, the better 
the test of the model fit will be, so that an investigation into the applica- 
bility of the method to small-scale work should be conducted with high n. 
This condition also enables inter-sample significances to be assessed 
before the ratings etc. are calculated, thus permitting immediate rejec- 
tion of undiscriminated sets of judgements, none of which, of course, 
can yield any information about the fit of the model. 


EXPERIMENTAL 


Four small flavor modifications were introduced into each of two 
basic materials, fruit juice and meat. Thus six paired taste contrasts 
were obtained for each material. Six subjects made 20 replicate prefer- 
ence judgements on each of the 2 X 6 pairs. Altogether, therefore, 
1440 decisions were made. 


TABLE I. 
Constitution of the Eight Samples 
(in parts) 
Reference Fruit juice Ground meat 
A 98 orange + 2 lemon 50 beef + 50 pork + 0.2 salt 
B 90 orange + 10 apple 50 beef + 50 pork 
' + 0.1 tenderizer 
Cc 95 orange + 5 lemon 524 beef + 474 pork + 0.2 salt 
D 85 orange + 5 lemon 52} beef + 473 pork 
+ 10 apple + 0.1 tenderizer 


Canned sweetened orange juice was modified by admixing canned 
apple juice and lemon juice. The meat samples were formed of blends 
of ground beef and pork, the additives being salt or a commercial 
tenderizer; these blendings involved slight textural disuniformities. 
The composition of the 8 samples is given in Table I. The juices were 
served in 30 ml. aliquots at 60°F. The meat was formed into one-ounce 
patties, broiled for 35 minutes, and served hot. 

A tasting schedule was drawn up to allow every subject to make a 
complete set of 12 coded comparisons per day, half each at mid-morning 
and mid-afternoon sessions. The procedure was repeated for 22 daily 
sessions, the first two of which, however, were dummy in the sense that, 
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TABLE II. 


Recorded Frequency f;; of Specified Preferences in 20 Replicate Pair Comparisons 


Subject reference: 
Material 


VVVVVVIVVVVV 
SSaga 


unknown to the subjects, the results were discarded. The order of 
presentation of any 6 comparisons over the two daily sessions was 
randomized, with the restriction that, overall, each pair was presented 
10 times with one sample on the right-hand side, and 10 times vice 
versa. Preferences were recorded on simple forms. 


RESULTS AND ANALYSIS 


All preferences, as frequencies of choice (f;;) in a specified direction, 
are assembled in Table 2. Summed preferences per sample (f;) are 
given in Table 3 together with the corresponding x’ (= >> f/20 — 180) 
for the discrimination of each subject for each material. 


Homogeneity Tests 


Possible inter-subject differences were tested by means of Haldane’s 
treatment of 2 X N contingency tables when some expectations are 
small (5). The preferences, and their complements to 20, over all 6 
subjects, were arrayed for each of the six pairs of both materials, and 
the 12 values of x’ calculated. The first and second moments, k, and 
k, , of the corresponding conditional distributions of x’ for the marginal 
totals were then obtained from Haldane’s formulae, which may be 
written: 


340 
preference I II Ill IV V VI 
A>B 12 4 9 10 5 14 
: ll 17 16 8 14 8 
Fruit 16 13 17 13 13 9 
juices 6 15 17 11 19 3 
10 19 15 13 17 3 
13 7 8 8 11 16 
9 18 12 - 15 17 18 
9 11 7 10 10 9 
Meats 15 17 12 14 18 16 
5 6 | 3 4 9 
7 7 10 10 11 13 
14 14 12 13 13 ll 
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TABLE IIL. 


Summed Preferences Frequencies fi; , and x? (df = 3) for Discrimination 


Subject reference: 


Material Sample 


*: exceeding 5% point of x?. 
**: exceeding 1% point of x. 


k, = SN 


and k, = (S — 1)(S — 2)(S — 3) (; — 1 a5) 


where N, the number of subjects, is 6, and S, the total number of judge- 
ments, is 6 X 20, and A = S — Bis the sum of all subjects’ preferences, 
in the specified direction, for the particular pair comparison. The 
moments, summed over each material, were then used to express in 
standard measure the differences between the observed values of x? 
and their expectation for homogeneity. The deviates turned out to be 
13.1 for the fruit juices and 1.1 for the meats. So it appears that these 
subjects had remarkably similar tastes for meats, and remarkably 
dissimilar tastes for the fruit juices. 

A plot of the aggregate frequencies over subjects and pairs for each 
day and each material showed no visible secular trend. To test for 
general heterogeneity among replicates, Cochran’s index of discrepancy 


was employed (6). Here n is the number of replicated sets of pair 


& 
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| 
I II Ill IV Vv VI | 
' A 39 34 42 31 32 31 209 
Fruit B 24 50 43 34 51 12 214 
juices Cc 36 15 15 29 18 45 158 
D | 2 21 20 26 19 32 139 
x? = | 11.7** | 36.1** | 31.9%*| 1.7 |35.5** | 27.7%* | 34.7%* 
F A 33 46 31 39 45 43 237 7 
B 23 15 22 18 18 24 120 
Meats Cc - 40 37 41 40 39 33 230 
D 24 22 26 23 18 20 133 
x2 = 9.7* |29.7**| 10.1* | 18.7** | 29.7** | 15.7** | 96.3** 
| 
| 
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TABLE IV. 
Indives, Q, of Discrepancies Between Replicates in Preferences 


Subject reference: 
Material 


Ill 


Fruit juices 20.5 : 20.7 : 26. 
28. 


3 
Meats 18.1 ; 18.6 0 


Percentage points of x? for 19 degrees of freedom: 
5%: 30.1; 1%: 36.1 


comparisons, i.e. 20; f, is the number of preferences (in the specified 
directions) in any one set; and f;; is the number of preferences in the n 
replicates of any one pair. The limiting distribution of this index is 
that of x’. The Q values are assembled in Table 4, where it will be 
seen that none of the 12 is high enough to suggest a discrepancy. 


The Bradley-Terry Fit 


If the kind of preference continuum discussed above is real, we may 
calculate expected frequencies, as already shown, from the rating 
estimates. This having been done, goodness of fit of the observed 
frequencies was tested by Bradley’s suggested method (7) of calculating 
an index of discrepancy >> [(f — f’)?/f’], the summation extending over 
all preferences (and their complements to 20) for each subject with 
each material. For perfect fit, this index is distributed as x’ with 3 
degrees of freedom. The results are shown in Table 5, from which it is 
apparent that, in general, the fit is good. Of the 12 values of x’ two lie 
just beyond the 5% point, but as the 6-subject totals, at 19.7 for the 


TABLE V. 
Values of x? for Goodness of Fit of Preferences to the Bradley-Terry Model 


Subject reference: 
Material 


I III IV 


Fruit juices 81 


1 
Meats 6.04 Ry 


5 3.74 
a: 


0. 
60 58 


Percentage points of x? for 3 degrees of freedom: 
5%: 7.82; 1%: 11.35 


al 
| 342 
VI 
17.5 
29.0 
VI 
1.49 3.49 
3.40 8.11 
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juices and 22.8 for the meats, are little in excess of the expectation of 18, 
the two high values are probably fortuitous. 


DISCUSSION 


Care must be exercised in the interpretation of tests of goodness of 
fit of the Bradley-Terry model. Even if the model is not the best, it is 
unlikely to be far from truth, so that, as already intimated, an experi- 
ment of normal size could hardly be expected to yield evidence of misfit. 
Furthermore, the greater the sensory difference induced by the compared 
samples, the greater the likelihood of very small cell frequencies, with 
consequent inflation of the index of discrepancy. However, the large- 
scale trial already made (3) on sweetness intensity has shown excellent 
agreement with the model, and the present work indicates that the 
model applies equally well to sensory preferences. The conclusion 
that a preference continuum, analogous to a continuum of sensation 
intensity, is, at least in some circumstances, a workable concept, is the 
most important single outcome of the investigation. 

Inter-subject preference variation was of course not unexpected, 
yet it emerges that there was close agreement about the delectability 
of the meat samples—those without tenderizer were preferred. 

Individual preference for the fruit juices varied, but in general the 
addition of 5% lemon juice to orange juice was less favored than the 
addition of 2% lemon juice or 10% apple juice. That there should 
be more agreement over a desirable quality of meat than over that of 
fruit juice mixtures is perhaps understandable. 
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SUR LA DETERMINATION DE L’AXE D’UN NUAGE 
RECTILIGNE DE POINTS 


GrEorGEs TEISSIER 
Station Biologique de Roscoff 


_ Au cours d’une étude de la relation d’allométrie développée devant 
la premiére Conférence internationale de Biométrie (1948) et parue 
ici-méme, j’ai été conduit 4 examiner le probléme de la liaison linéaire 
de deux variables jouant des réles symétriques, telles par conséquent 
qu’aucune d’elles ne peut légitimement étre considérée comme indé- 
pendante. Transportant dans le domaine du calcul une technique 
d’interpolation graphique classique, j’ai proposé de représenter la 
relation cherchée par la droite qui rend minimum la somme des aires 
des triangles rectangles ayant cette droite pour hypoténuse commune, 
deux cétés paralléles aux axes, et leurs sommets aux points X,, X,. La 
forme de |’équation ainsi obtenue: 


X,-X,_X,-X; 


incite a écrire, si l’on étudie n grandeurs: 


X.—X,_ Xs— Xs 


03 


(D) 


expression qui donne simultanément, par |’équation d’une droite de 
l’espace 4 n dimensions, toutes les relations existant entre les variables 
prises deux 4 deux, trois 4 trois . . . . (1948, p. 54). 

Les propriétés de cette droite ont été étudiées depuis par Kermack 
et Haldane (1950), pour le cas de deux variables, et par Kruskal (1953), 
pour celui de n variables. Mais, s’il ne semble pas utile de revenir 
longuement sur le cas de deux variables, il est certain que le probléme 
général mérite d’étre repris avec plus de précision. Le présent travail 
développe une communication sur ce sujet présentée en mon nom par 
G. Darmois 4 la troisitme Conférence internationale de Biométrie 
(1953). Un court résumé en a été donné ici-méme (1954). 


Je rappellerai seulement, pour le cas de deux variables, que le choix 
d’une ligne de regression traduisant au mieux les résultats d’un ensemble 
de mesures, dépend des conditions dans lesquelles ces mesures ont été 
pratiqués, de la nature et de la grandeur de |’incertitude qui pése sur 


344 


— 
on 
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ces derniéres, et aussi du but du travail poursuivi. Dans bien des cas, 
l'objet de la recherche est moins de fournir une régle de prévision, que 
d’extraire des données d’observation une expression approximative de la 
loi fonctionnelle qui unirait les deux variables comparées, si toutes les 
causes de perturbations pouvaient étre éliminées. C’est ce probleme 
que j’avais tenté de résoudre dans le cas de la relation d’allométrie, sans 
peut-€tre avoir montré assez clairement qu’il s’agissait 14 d’un cas par- 
ticulier de ce que Quenouille (1952) nomme trés justement la recherche 
de la “loi sous-jacente” 4 un ensemble de résultats statistiques. Ren- 
voyant & |l’excellent livre de cet auteur pour tout ce qui concerne 
les généralités sur cette question, je rappellerai seulement que, dans le 
cas de deux variables, les renseignements dont on dispose ne permettent 
généralement pas de donner une solution unique au probléme posé. Il 
en va autrement, comme nous allons le voir, lorsque |’étude porte 
simultanément sur un plus grand nombre de variables. 

Pour préciser notre recherche, en lui donnant en méme temps une 
forme simple, imaginons que nous ayons pratiqué n mesures sur chacun 
des éléments d’un échantillon extrait d’une population homogéne 
d’animaux adultes, et supposons que |l’espéce étudiée, un Crustacé 
Brachyoure par exemple, soit |’une de celles pour lesquelles les relations 
d’allométrie se vérifient bien. C’est dire qu’en moyenne les logarithmes 
X des mesures varient proportionnellement les uns aux autres et que 
les corrélations qui les unissent deux 4 deux sont fortes. Les points 
figuratifs X, , X, --- X, , constituent dans ce cas un nuage trés allongé 
et sensiblement rectiligne. La détermination de la forme de ce nuage 
nécessitera, dans l’hypothése d’une distribution normale, |’estimation 
des moyennes, des variances et des covariances, soit de n (n + 3)/2 
paramétres distincts, qu’il faudra ensuite combiner diversement pour 
obtenir des informations utilisables. Mais il est clair que certaines 
caractéristiques de la distribution dépassent en importance toutes les 
autres : ce sont celles qui permettent de définir la ligne sur laquelle est 
axé le nuage. Si celui-ci est trés étroit, la connaissance de |’‘axe”’ 
pourra méme suffire 4 la plupart des applications pratiques. Plusieurs 
procédés peuvent étre employés, pour estimer la position de cette droite, 
4 partir des moments des deux premiers ordres. 


Une condition de moindres carrés, imposant 4 la somme des carrés 
des distances des divers points figuratifs 4 la droite cherchés d’étre un 
minimum, conduit 4 adopter comme solution du probléme le plus grand 
axe de ellipsoides d’égale probabilité (Voir par exemple Cramer 1946). 
Comme cet axe dépend des unités de mesure adoptées, il convient d’user 
de variables réduites, ce qui raméne notre probléme au calcul de la 
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premiére composante principale de Hotelling; on détermine cette com- 
posante a partir de la matrice des corrélations, par une méthode directe 
si les variables sont trés peu nombreuses, par un procédé d’itération dans 
le cas général. On sait que la fraction de la variance totale dont rend 
compte la premiére composante principale est mesurée par la premiére 
racine caractéristique de la matrice et qu’elle est plus grande que celle 
que |’on pourrait extraire par toute autre fonction linéaire des mesures 
calculée & partir des mémes coefficients de corrélation. C’est en ce sens 
que cette droite (D’) peut étre considérée comme étant la droite de 
meilleur ajustement. Mais on doit noter que si (D’) passe plus prés des 
points observés qu’aucune autre droite, elle ne permet pas, en revanche, 
de restituer les coefficients de corrélation avec une précision égale & 
celle que |’on peut obtenir par d’autres procédés. 


Le probléme peut en effet étre abordé d’une autre maniére. Si nous 
admettons a priori qu’il existe une droite (D’’) représentative du 
phénoméne étudié, nous pouvons convenir d’en donner une représenta- 
tion paramétrique et écrire, pour chacun des X une relation X, — X, = 
a, (T — T) ot T est une variable auxiliaire, le facteur général, qu’il 
reste 4 définir, et a, le coefficient de regression de X, en T égal a 
Tr, 9,/Or, OU Tz, est le coefficient de corrélation de X, et T. Les écarts 
individuels aux n relations ainsi définies sont indépendants de 7’; s’ils 
sont également indépendants entre eux, ce qui n’est nullement nécessaire, 
mais peut arriver, on doit avoir, pour chacun des n (n — 1)/2 coefficients 
de corrélation une relation de la forme r,, = rz, Tr, et il s’agit d’estimer, 
& partir de ce nombre surabondant d’équations, les n coefficients rz, . 
Ces saturations en facteur 7’ des n variables sont données par une 
formule due 4 Spearman et |’on en déduit immédiatement une définition 
de T qui permet d’en estimer la valeur pour tout individu. Cette 
estimation du facteur général est d’ailleurs peu utile, 7’ n’étant qu’un 
intermédiaire dans des calculs oi: sa forme analytique n’intervient pas 
explicitement, et pouvant étre éliminé sans inconvénient de |’expression 
du résultat final qui s’écrira: 


7,03 


équation d’une droite (D’’) qui différe 4 la fois de (D) et de (D’). La 
solution obtenue épuise toutes les informations dont on dispose sur la 
variance liée des n grandeurs et restitue, aux erreurs d’échantillonnage 
prés les différents coefficients de corrélation. L’estimation qu’elle 
donne des différents X est, en revanche, moins précise que celle que 
permet d’obtenir la droite (D’). 


| 
( 
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Dans le cas général, les écarts individuels des diverses variables 
aux lignes de regression de X en 7 ne sont plus indépendants et un 
facteur unique ne suffit plus 4 expliquer les corrélations observées. Le 
probléme est alors d’extraire un facteur général d’un ensemble de 
données justiciables d’une analyse factorielle. I] peut étre résolu par 
des procédés trés semblables 4 ceux que !’on utilise pour la recherche de 
la premiére composante principale. Comme dans le cas d’un facteur 
unique, la droite (D’’) restitue les coefficients de corrélation mieux que 
la droite (D’) et les différents X moins bien qu’elle. 


On doit alors se demander quelle est, des droites généralement dis- 
tinctes (D’) et (D’’) celle qu’il convient de retenir comme solution au 
probléme posé. Mais la réponse 4 cette question ne peut pas étre 
immédiate. Les calculs qui donnent les équations de l’une ou de |’autre 
des deux droites devraient en effet normalement se poursuivre par 
l’extraction d’autres composantes ou d’autres facteurs qui seraient 
nécessaires 4 |’interprétation compléte des faits observés. En arrétant 
notre analyse 4 sa premiére étape, et en limitant notre étude a la 
recherche d’une droite d’ajustement, nous sacrifions nécessairement 
une part plus ou moins grande de |’information incluse dans les données 
dont nous disposons, part qui n’est pas exactement la méme dans les 
deux méthodes que nous comparons. Pour préciser ce point, nous 
examinerons d’abord quelques cas particuliers.’ 

Pour deux variables, |’analyse factorielle n’est pas utilisable : on ne 
dispose en effet, pour calculer deux saturations que d’une seule équation 
de définition r,. = rr,rr2. Il en résulte que (D’’) peut étre l’une quel- 
conque des droites comprises entre les lignes de régression de X, en X, 
pour laquelle r;, = let rr. = r,.,et de X, en X,, pour laquelle r;, = 1 
et Tr: = Ti2 , la droite (D’) superposée 4 (D) correspondent & rz, = 
= Vr. Ce résultat montre de facon particulitrement claire 
l’incertitude fondamentale du choix de la relation linéaire unissant deux 
variables et la nécessité d’une hypothése complémentaire, explicite ou 
implicite, sur le rapport des variabilités propres des deux grandeurs 
comparées, 


1T] ne saurait étre question d’aborder ici une étude comparative des caractéristiques théoriques de 
la méthode des composantes principales et de l’analyse factorielle, pour laquelle je renverrai 4 des 
ouvrages tels que ceux de Thompson (The fuctorial analysis of Human Ability) ou de Burt (Factors of the 
Mind). Rappelons seulement que les techniques de calcul employées dans les deux cas sont les mémes, 
4 cela prés que pour aboutir aux composantes principales, il faut que les éléments placés dans la diago- 
nale principale de la matrice des corrélations soient tous égaux 4 l'unité, tandis que, pour aboutir aux 
facteurs, il faut que ces éléments diagonaux, les communautés, soient estimés par approximations suc- 
cessives. La trace de la matrice, somme des éléments diagonaux, a, dans le premier cas, sa valeur maxi- 
ma n et, dans le second, la valeur minima compatible avec une solution réelle du probléme posé, les deux 
techniques correspondant ainsi, en quelque sorte, 4 deux interprétations extrémes des faits observés. 
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Pour n = 3, les trois droites sont généralement distinctes. La 
solution du systéme des trois équations de définition de T est donnée 
Par = 13/723, = = ‘Si ces saturations 
sont inférieures 4 1, c’est-a-dire si les trois coefficients de corrélation 
partielle rj2.3, 713.2 , 23-1 , sont positifs, la solution ainsi obtenue est 
acceptable et le facteur général suffit 4 expliquer les corrélations 
observées. Si l’une ou |’autre des saturations est plus grande que 1 
(cas Heywood), deux facteurs sont nécessaires et la solution est in- 
déterminée. Dans le premier cas, le plus fréquent de beaucoup, (D’’) 
peut s’écrire: 

X, — X, — X; — X; 


= = 
G2 03 


équations remarquables en ce qu’elles donnent pour la relation existant 
entre X, et X. une expression qui ne dépend pas de r,. mais bien de 
113 et 723 : l’introduction d’une troisiéme variable léve ainsi l’indétermina- 
tion qui pesait sur la relation existant entre les deux premiéres. Les 
lignes de régression de X, en X, et de X, en X, correspondent re- 
spectivement aux cas ot les coefficients de corrélation partielle r.;., et 
113-2 sont nuls; les trois droites (D), (D’) et (D’’) ont méme projection 
sur le plan X,X, lorsque 7; = 12;. Elles se superposent lorsque les 
trois coefficients de corrélation sont égaux. 


Ce dernier résultat se généralise immédiatement, pour d’évidentes 
raisons de symétrie, au cas d’un nombre quelconque de variables: 
lorsque les coefficients de corrélation de tous les X pris deux 4 deux sont 
égaux, et dans cette hypothése seulement, les droites (D’) et (D’’) se 
superposent 4 (D). Elles ne coincident d’ailleurs pas point par point, 
la valeur estimée du facteur général ¢ relatif 4 un individu n’étant pas 
identique a la valeur calculée correspondante f de la premiére compo- 
sante principale. 

L’équation caractéristique de la matrice des corrélations est facile 
& résoudre dans ce cas. La racine correspondant au plus grand axe est 
égale 4 1 + (n — 1)r, r étant le coefficient de corrélation commun de 
toutes les variables; les (n — 1) autres racines, correspondant 4 autant 
d’axes secondaires de méme longueur et de direction indéterminée, sont 
égales 4 (1 — r), quel que soit n; les ellipsoides d’égale probabilité sont 
donc d’autant plus allongés, pour une méme valeur de r, que n est plus 
grand. Le variance moyenne qui subsiste, une fois que la premiétre 
composante est fixée, et qui est imputable 4 l’ensemble des (n — 1) 
composantes négligées est égale 4 (1 — r) (n — 1)/n; la valeur estimée 
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du coefficient de corrélation est, dans les mémes conditions, r +(1 — r)/n. 
Les équations de la droite (D’) sont: 


La valeur de F correspondant 4 un individu dont les mesures sont 
°°* Ly eat: 


1 — X; 


, 


ce qui donne pour les valeurs théoriques x{ , 22 --- x, correspondant 
aux valeurs observées 2, , 


L’analyse factorielle de la méme matrice est compléte aprés ex- 
traction du facteur général. La variance résiduelle moyenne imputable 
aux facteurs spécifiques est (1 — 1) quel que soit n; les coefficients de 
corrélation sont exactement restitués. Les équations de (D’’) ont la 
méme forme que celles de (D’), 7, estimation du facteur général y 
remplagant F, mesure de la premitre composante principale. La 
valeur de 7 correspondant l’individu x, , --- 2%, est estimée par 
régression a: 


ce qui donne pour x;’ , x’ ---+ 2%’ , valeurs calculées par le facteur 
général pour 2, , 22, 


-X, 


G2 


On voit que 7 différe de F, comme xj’ , xi’ , different de 
Ti ,%2, +++ &, ce qui explique que les résultats d’une estimation de la 


variance liée, ou du calcul d’un coefficient de corrélation restitué, ne 
soient, pas les mémes avee les deux méthodes. D’une facon phuis précise: 


1+ (n— 1)r’ 


1 
2. 
| 
| 
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les rapports étant d’autant plus proches de 1 que r est lui-méme plus 
voisin de 1 ou que n est plus grand. La différence entre les valeurs 
estimées de la variance résiduelle moyenne calculées par la premiére 
composante principale d’une part et par le facteur général d’autre part, 
est — (1 — r)/n, l’estimation fournie par |’analyse factorielle étant 
1 — r. La différence correspondante pour les coefficients de corrélation 
est — (1 — r)/n, la valeur exacte donnée par |’analyse factorielle étant 
r. Les estimations fournies par les deux méthodes convergent donc 
‘lorsque r tend vers 1 ou lorsque n croit. Si, r restant fixé, n augmente 
indéfiniment, la valeur limite est celle qu’avait donnée d’emblée l’analyse 
factorielle. . 


Il existe un dernier cas particulier important, dont nous avons 
d’ailleurs déja fait mention en définissant la droite (D’’). C’est celui 
ou les coefficients de corrélation peuvent tous étre mis sous la forme d’un 
produit de saturations r,, = rr, rr, , ou, autrement dit, la matrice des 
corrélations est strictement hiérarchique. Dans ce cas, les trois droites 
sont distinctes et ont chacune leur signification propre. (D) qui ne 
dépend pas des coefficients de corrélation, peut cependant étre considérée 
comme définissant la relation qui unirait les n variables si, les distri- 
butions marginales restant inchangées, les coefficients de corrélation 
Tp, devenaient tous égaux 4 leur valeur moyenne r. L’analyse factorielle 
étant compléte aprés extraction du premier facteur, les coefficients de 
corrélation sont exactement restitués, aux erreurs d’échantillonnage 
prés et la droite (D’’) peut étre considérée comme rassemblant le maxi- 
mum d’information sur la structure de la distribution totale. Cette 
droite présente par ailleurs un caractére d’invariance, qu’elle partage 
avec (D) et que ne posséde pas (D’). Si l’on ajoute en effet a la série des 
mesures déja faites, celles d’une nouvelle grandeur X, et si celle-ci peut 
étre caractérisée par une saturation r,, telle que pour tout X, on ait 
Tr. = Tr, Tr, , les équations de la nouvelle droite (D’’) s’obtiendraient 
simplement en complétant la série des égalités déja écrites par un nouveau 
terme (X, — X,)/rr, ¢,. Dans les mémes hypothéses, la droite (D’) 
devra étre entitrement recalculée, mais il est facile de voir que (D’) 
différe d’autant moins de (D’’) que le nombre des variables est plus 
grand. 

Si a? = > r7,/n est la variance moyenne imputable au facteur 
général, et si l’on suppose les X rangés dans |’ordre décroissant des satura- 
tions, la plus grande racine de |’équation caractéristique de la matrice 
des corrélations est, en effet, comprise entre 1 + na” — rz, et 1 + na? — 
ry, et différe peu par conséquent de 1 + (n — 1)d’ ou, ce qui nos hypo- 
théses revient pratiquement au méme, de 1 + (n — 1)7. Les (n — 1) 


i 
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autres racines différent les unes des autres et s’intercalent entre les 
variances résiduelles rangées par valeurs croissantes (1 — rz,), (1 — r7z,), 
(1 — rz,) -+- (1 — rz,). Elles varient peu avec n et les ellipsoides 
d’égale probabilité sont d’autant plus allongés que les variables sont 
plus nombreuses. Comme dans le cas précédent, la variance qui subsiste 
une fois fixée la premiére composante principale tend vers 1 — da’ lorsque 
augmente indéfiniment. Dans la méme hypothése, les saturations et 
coefficients de corrélation calculés 4 partir de la premiére composante 
principale tendront également vers les valeurs estimées par le facteur 
général. 


A tous les cas particuliers qui viennent d’étre envisagés peut étre- 
opposé le cas général of un seul facteur ne suffit pas 4 rendre compte 
des coefficients de corrélation observés et ot ceux-ci ne peuvant étre 
restitués exactement que par le jeu combiné du facteur général et de 
un ou plusieurs facteurs bipolaires. Mais, si l’analyse est conduite par 
des techniques dérivant de la méthode des moindres carrés, comme celle 
qu’a codifiée Burt, l’estimation du facteur général est telle que la valeur 
moyenne du coefficient de corrélation calculée d’aprés ce seul facteur 
est égale 4 la valeur moyenne des coefficients de corrélation observés. 
La variance résiduelle moyenne 1 — da’ est donc proche de 1 — 7. 

Il n’est plus possible, par ailleurs, de donner une expression générale 
de la plus grande racine de |’équation caractéristique, mais une formule 
connue en donne une valeur approchée qui, dans notre notation s’écrit 
1 + (n — 1)7, valeur identique 4 celle que nous avons obtenue par un 
calcul exact dans le cas de |’intercorrélation constante et retrouvée 
comme approximation pur une matrice hiérarchique. Comme précédem- 
ment, la variance qui subsiste en moyenne une fois fixée la premiére 
composante principale peut étre estimée 4 (1 — 7) (n — 1)/n et tend 
vers 1 — 7 si 7 restant constant, n croft indéfiniment. I] zy a donc la 
encore convergence de (D’) vers (D’’). 


Nous sommes maintenant en mesure de répondre & la question que 
nous nous étions posée. La droite (D) ne peut étre retenue que si les 
différents coefficients de corrélation peuvent légitimement étre con- 
sidérés comme égaux, circonstance évidemment exceptionnelle. Les 
droites (D’) et (D’’) sont utilisables dans tous les cas et, puisque (D’) 
tend vers (D’’) et s’en rapproche d’autant plus que, les mesures étant 
multipliées, nous disposons de plus d’informations sur notre matériel, 
(D”) doit étre préférée & (D’). S’il se trouve qu’un facteur général 
puisse suffire 4 une restitution exacte, aux erreurs d’échantillonnage 
prés, des coefficients de corrélation observés, la droite (D’’) traduira 
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complétement les relations existant entre les n variables, les écarts 
entre X calculés et X observés étant indépendants les uns des autres et 
devant étre tenus pour fortuits. S’il n’en est pas ainsi et si, une fois 
extrait le facteur général, il subsiste des corrélations significatives, on 
doit conclure que la droite (D’’) n’épuise pas les informations que nous 
apportent les données sur les relations unissant les variables étudiées, 
mais la fraction qu’elle en extrait est d’autant plus grande que 7 est 
plus proche de 1. 

On remarquera que la solution qui est proposée fait disparaitre la 
difficulté, fréquente dans la pratique biométrique, du choix d’une 
grandeur de référence. Celle qui intervient dans nos calculs, F ou T, 
suivant qu’il s’agit de la premiére composante principale et de (D’), ou 
du facteur général et de (D’’), est une moyenne pondérée de tous les 
(X — X)/o, relatifs & un individu. Dans le cas ov |’on croit pouvoir 
utiliser (D), le paramétre sorepentnns serait simplement la moyenne s 
des écarts réduits des X. 


Il ne sera peut-étre pas inutile d’illustrer l’étude qui précéde par un 
exemple concret. Celui-ci a été pris dans un mémoire en cours d’impres- 
sion dans les Archives de Zoologie expérimentale et générale, et porte sur 
un Crabe Oxyrhynque, Maia squinado, sur lequel ont été mesurées les 
dimensions principales du céphalothorax et de différents appendices. 
Les X sont les logarithmes népériens P, , M, , M, , M2, M;, M, et L, 
des huit mesures pratiquées sur chacun des 301 animaux étudiés. Ces 
X ont des distributions sensiblement normales et leurs coefficients de 
corrélation, trés élevés, s’écartent peu de leur valeur moynne 7 = 0,9752, 
comme le montre le Tableau I. Le Tableau II donne les saturations 
en F et en 7, rpx et rrx , calculées par sommation et itération 4 partir 
du Tableau I, et les coefficients directeurs des droites (D) et (D’) et 
(D”), soit respectivement, cx , Trxox et Trx7x'. Les trois solutions sont 
manifestement trés voisines l’une de |’autre, fait en relation évidente 
avec la faible dispersion des coefficients de corrélation. 

La comparaison entre les résultats des différentes techniques de 
calcul est plus instructive lorsqu’elle porte sur les relations unissant 
deux variables X, et X,. Si @ est la grandeur de référence prise par 
convention comme variable indépendante, on est toujours en droit 
d’écrire: 


X, — X, = ™ % (x, X)) 


\Le facteur général étant seul recherché, les communautés utilisées dans cette analyse factorielle 
ont regu, par approximations successives, la plus petite valeur possible. Les saturations en 7, obtenues 
par une analyse factorielle complte différent d’ailleurs trés peu de celles qui sont données ici. On les 
trouvera dans mon mémoire des Archives de Zoologie expérimentale et générale. 
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TABLEAU I. 

Po Ce Mo M, M, L 
,9899 ,9892 ,9742 ,9703 ,9702 ,9616 ,9592 
Co ,9929 ,9776 ,9735 ,9722 ,9669 ,9611 
Mo ,9892 ..... ,9798 ,9757 ,9742 ,9689 ,9658 
,9742 ,9776 ,9798 ,9900 ,9896 ,9832 ,9672 
,9703 9735 ,9757 ,9900...... ,9861 ,9674 
M; ,9702 ,9722 ,9742 ,9896 ,9911. ..... ,9861 ,9660 
M, ,9616 ,9669 ,9689 ,9861 ,9861 ..... ,9592 
L ,9592 ,9611 ,9658  ,9672 ,9674 ,9660 ,9592...... 

TABLEAU II. 
XxX ox Trx Trxox Trx Trxex 
(D) (D’) (D”’) 
Po 4,8804 0, 2008 0, 9875 0,1983 0,9854 0,1979 
Co 4,1961 0,1789 0,9900 0,1781 0, 9887 0,1769 
Me 4,3386 0,1749 0,9916 0,1734 0,9908 0,1733 
M, 4,4892 0,1440 0, 99865 0,1431 0, 9934 0,1430 
M; 4,3836 0,1361 0, 9925 0,1351 0,9921 0,1350 
M; 4,2413 0,1332 0,9920 0,1321 0,9913 0,1320 
M, 4,0497 0, 1269 0,9872 0, 1253 0,98650 0,1250 
L §,1909 0, 1036 0,9788 0,1014 0,9789 0,1009 
TABLEAU III. 
xX ox Trx Trxox 'rx 
(D) (D’) (D”) 
Po 4,8804 0, 2008 0, 9867 0,1981 0,9801 0, 1968 
M, 4,4892 0,1440 0,9936 0,1431 0,9944 0, 1432 
M; 4, 3836 0,1361 0,9927 0,1351 0,99265 0,1351 
L §,1909 0, 1036 0,9840 0,1019 0,9754 0,1010 


On obtient (D) en supposant que |’on peut admettre l’égalité de r,¢ et 
‘2g - (D’) et (D”) correspondent respectivement 4G = F et G = T, 
la grandeur de référence pouvant étre définie par un nombre plus ou 
moins grand de variables. Sin = ¢ et si l’on ne se trouve pas dans le 
cas Heywood, 7 est déterminé exactement par |’adjonction d’une 
variable auxiliaire X; au couple de variables principales X, et X, et 
la solution factorelle correspond 4G = X,. Les deux droites de régres- 
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sion de X, en X, et de X, en X, correspondent respectivement a 
G = X, = X,eG = X, = X,. 

Pour appliquer ces formules au cas des Maia squinado, quatre des 
huit grandeurs étudiées P, , M, , M, et L ont été choisies et les co- 
efficients angulaires des six droites, désignées ici par (P, , L), (M, , L), 
(M,, L); (Po, M2), (M, , M2) et (Po, M,), ont été calculés par différents 
procédés et 4 partir de une, deux ou six variables auxiliaires. Le tableau 
III correspond au Tableau II, mais comporte seulement quatre lignes: 
il a été calculé a partir des six coefficients de corrélation existant entre 
les quatre variables étudiées, par les mémes méthodes que celles qui 
ont servi 4 l’analyse de |’ensemble des résultats. 

Le Tableau IV donne les pentes des six droites calculées par trois 
méthodes. Les huit premiéres lignes donnent les résultats obtenus en 
prenant successivement comme variable auxiliaire X; une des grandeurs 
P,,C., -:: ou L, et en appliquant, soit la technique des composantes 
principales (D3), soit la technique factorielle (D3’). Les nombres 
en italiques correspondent 4 X,; = X, et X; = X, ; ils donnent la pente 
de (D) dans la colonne (D3) et les pentes des deux lignes de régression 
classiques dans la colonne (D;’). La neuviéme ligne donne la pente de la 
droite (D) obtenue en négligeant les différences existant entre les co- 
efficients de corrélation; elle est accompagnée de son écart-type qui est 
intermédiaire entre les écarts-types des deux coefficients de régression 
correspondants. Les lignes (D;) et (D‘’) ont été calculées & partir des 
données du Tableau III, et les lignes (Dé) et (D4’) 4 partir du Tableau 
II. Elles donnent les valeurs numériques correspondant aux relations 
obtenues par la technique des composantes principales (D’) et par la 
technique factorelle (D’’) pour quatre variables P, , M, , M, et L, et 
pour les huit X. 

Pour un couple donné de variables X, , X, , (D) est, par définition, 
indépendant. de toute corrélation et les coefficients de régression ne 
dépendent que de r,, . Toutes les autres estimations de la pente de la 
droite (X, , X.) changent avec la nature et le nombre des variables 
auxiliaires choisies et dépendent aussi de la méthode de calcul adoptée. 
Les pentes estimées sont toujours comprises entre celles qui correspon- 
dent aux deux lignes de régression, et assez éloignées de |’une et de l’autre. 
On pourra constater que les pentes des (D{’) et des (Dg’) sont trés ap- 
proximativement égales aux moyennes des pentes des (D3’) correspon- 
dantes, droites de régression non comprises, et que les pentes des (D%) 
et (D3) sont proches des moyennes des pentes des (Dj4’), droites de 
régression comprises et non pas des moyennes des pentes des (D3). 
Une restitution beaucoup plus exacte des pentes des (D’) est d’ailleurs 
obtenue en remplacant dans la moyenne des pentes dés (D4’) les pentes 
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des deux lignes de régression par le double de la pente de (D) corre- 
spondante. Par la s’explique bien que (D’) soit toujours compris 
entre (D) et (D’’) et que, confondu avec (D) lorsqu’aucune variable 
auxiliaire n’est utilisée, il s’en détache pour n = 3 et se rapproche 
graduellement de (D’), lorsque le nombre des variables auxiliaires, 
passe de 1 & 2, puis de 2 4 6. 

*  Ainsi se vérifient les propositions que nous avions énoncées et se 
justifie la préférence que nous avions donnée & la droite d’interpolation 
obtenue par analyse factorielle. 
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THE VARIANCE OF THE GENETIC CORRELATION 
COEFFICIENT 


E. C. R. REEvE* 
Institute of Animal Genetics, Edinburgh 


1. Introduction 


A genetic correlation coefficient measures the degree of association 
between the genetic variations of two quantitative characters in a given 
population, e.g. wing and thorax length in Drosophila (Reeve and 
Robertson, 1953), or weight and market score in pigs (Hazel, 1943). 
Genetic correlations are frequently calculated in work on quantitative 
inheritance and animal breeding research, but no method is available 
for assessing their accuracy when they are estimated from a given body 
of data. It is the purpose of this paper to develop formulae for the 
large-sample variance of a genetic correlation when it is estimated in 
various ways from parent-offspring covariances or correlations. Quali- 
fications to be borne in mind when using these formulae will be discussed 
later. 

Consider a progeny test in which a number of pair-matings are 
made of parents selected at random, and two characters are measured 
on both parents and progeny from each mating. Let a and x be the 
parent plienotypes and b and y the corresponding progeny (or progeny 
mean) phenotypes for the two characters, so that (a, b) refer to character 
1 and (x, y) to character 2, in the two generations. Then the genetic 
correlation (r;) between characters 1 and 2 may be estimated in two 
ways, both of which appear to have been first suggested by Hazel 
(1943). The alternative formulae are as follows: 


_ | |” _ | fay}- {arb} |'? 

ro = [xt ]" - 
_ tay} + {2b} 


where {ay} is the covariance between a and y, etc., and may in practice 
be replaced by the sum of products >> (a — a) (y — 9), provided that all 
four covariances have the same number of degrees of freedom. 


*Member of the Agricultural Research Council's scientific staff. 
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It may be easily shown that both formulae give estimates of the 
same parameter; for suppose that H, and H, are the genetic values of 
the two characters in an individual (i.c. the expected phenotypes of 
individuals of the same genotype, in a given environment), then, assum- 
ing that non-additive genetic variations can be ignored, and writing E 
for ‘Expected value of’, 


Efay} = E{xb} = } cov{H, , H2} 
E{ab} = 30°{H,}, E{xy} = 30°{H2} (2) 
te = cov{H, , 


Formulae (la) and (1b) follow immediately, and either may be 
derived as a least squares estimate, after taking logarithms in the 
third equation of (2) to give a linear formula for rg , depending on 
whether we combine {ay} and {2b} first, to give a separate estimate of 
cov {H, , H,} (formula 1b), or carry out the whole Least Squares 
operation in one step (formula la). 

As we shall see, both formulae give estimates with the same variance 
in large samples, but obviously (1b) must always give a smaller estimate 
than (la), except when {ay} = {xb} and the two estimates are equal. 
Probably both formulae lead to biassed estimates in finite samples, but 
I have been unable to calculate the extent of this bias or to suggest 
which estimate will generally be least biassed. Formula (1b) has the 
advantage of giving a non-imaginary estimate when either {ay} or 
{xb} comes out negative and the other covariances are positive, and 
for this reason this formula tends to be preferred (Morley, 1951). The 
question of their relative merits obviously needs further study. 

The four variables may be individual phenotypes of parent and 
offspring, but a more accurate estimate is obtained when a and z are 
mid-parent values and b and y are the means of, say, n progeny of a 
family. This will apply to a test in which pair matings are made and a 
family of full-sib progeny from each mating is measured; but alternatively 
we may have measurements on families of half-sib progeny and only on 
the parent common to each family. In this case a and z are single 
parent phenotypes and b and y are means of n half-sibs. Forinulae 
(1) and (2) apply in both cases, and they will be treated together. 
Modifications when the characters are sex-limited, or selection and 
assortative mating of parents are practised, will be discussed later. 
Sex-linked effects will be ignored (the separation of sex-linked and 
autosomal genetic variances for one character have been discussed by 
Reeve (1953), but the separation of genetic correlations of the two types 
would raise too many complications for treatment here). - 
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2. Variance when parents are a random sample. 


For convenience, we shall deal with formula (la), and will show later 
that (1b) has the same variance. Suppose the four covariances are all 
estimated from a single progeny test, consisting of f families of n progeny 
and their corresponding parents. We then have a quadri-variate distri- 
bution of a, b, x and y, each of which is correlated with the others, so 
that errors in the four covariances of (la) will be correlated. 

Taking logarithms in (la), differentiating, squaring and taking 
expectations, we obtain approximately: 


_ra[Viay} , Virb} , Vfab} , Vizy} 


2 Cov [{ay}, {zb}] , 2 Cov [{ab}, {ry}] 


+ 


{ay} - {xb} {ab} - {zy} 

_ 2Cov [fay}, {zy}] _ 2 Cov [fab}, {xb}] 
fay} - {xy} {ab} - {xb} 

_ 2Cov [{ab}, fay}] _ 2 Cov [{zxb}, (3) 
{ab} - {ay} | {xb} - {ry} 


where V stands for “Variance of’’, and Cov [{ay}, {xb}] stands for the 
sampling covariance of the covariances of a with y and z with b. To 
make further progress, we must assume that the four variates are 
normally distributed. Then we have, approximately, (writing p for a 
population correlation coefficient): 


V{ay} = + pir), 
so that 
Viay} _1 
fay}? = (1 + +), ete. (4) 
Also 
Cov {tab}, = 126 _ _ ,, 


ignoring terms of order 1/f?. 
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Similarly 
1 
Cov [{ab}, {ay}] = j [Harty — (6) 
The moments y,,,, etc. may be found by extending the M.G.F. for a 
bivariate population: 
M(t, t, ’ t,) = exp + 2 } 
We obtain 
= + ParPoy + PayPoz) (7) 
and 
Mater = Po: + 2 Par * Par} (8) 
Then from (5) and (7) 


Cov [{ab}, {ry}] = 1 Paz’ Poy + Pay” Por (9) 
{ab} - {zy} jf Pad” 
and from (6) and (8) 
Cov [{ab}j, _ 1 
fab}-{az}~ f (10) 


Substituting formulae of type (4), (9) and (10) in (3) we obtain the 
rather impressive formula: 


rg 1 1 1 1 
42 [ax]- [by] + [ay]}-[xb] 42 {ax]}- [by] + [ab]- [zy] 


[ab] - [xy] [ay] - [2b] 


where square brackets, in order to save subscripts, indicate correlation 
coefficients, e.g. [az] = p,, , and should not be confused with {az} = the 
covariance between a and z. 

The correlation coefficients in (11) may be expressed in terms of the 
so-called heritabilities of the two characters and their genetic and 
phenotypic correlations. Suppose a and z are the mid-parent (or 
common parent) phenotypes, and b and y are the means of n progeny, 
where n is the same for all matings. Using subscripts 1 for character 
(a, b) and 2 for character (x, y) and writing h’ for the heritability of 
individuals—i.e. the fraction of the phenotypic variance which is 
genetic—we have the following relations, which hold good for progeny 
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tests in which either the progeny are full sibs and mid-parent values 
are used, or the progeny are }-sibs and the one common parent 1s 


measured: 
= /( n 


[ax] = rp 


[by] = + / ey) 


(12) 


4 


where rp = phenotypic correlation between characters 1 and 2 for 
individuals, k = 1 in a test using mid-parent values and families of full 
sibs, and k = 2 for a test using one parent and families of 3 sibs. 

These formulae are easily derived. Thus, for mid-parent and full-sib 
families, [ab] = hic./o, , = 40; , and = 1/n + (n — 1)rooloi , 
where a; is the phenotypic variance of character (a, b) in individuals, 
and ro9 = $hj is the correlation between full sibs (ef. Appendix and 
Reeve, 1953). 


Substituting equations (12) in (11) and simplifying, we obtain: 


(1 — ro) Jk(L — rp) _ | 


where C = h,h, , 1/D = } (1/hi + 1/h3), and n progeny are measured 
from each of f families. As a further variation, if we are dealing with 
sex-limited characters, in a test such that one parent and families of 
full sibs of the same sex are measured, then k = 1 and all items in (13) 
are multiplied by 2 except the first term: (1 — rg)?/2f. 


oul = hire / (2) 
| 
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It follows from (13) that, if a given total number of individuals are 
to be measured (i.e. nf is constant), the variance of rg is least when 
n = 1, and (13) then reduces to: 


2 
But in practice, of course, it is often easier to increase n than f. 

In tests with poultry and livestock, usually several sires are each 
mated to a number of females, and there may be several progeny from 
each mating, so that each sire provides families of full sibs in a popula- 
tion of half-sibs. Within each sire group we may take half the female 
parent’s phenotypes as the mid-parent values (since the sire’s pheno- 
types for the two characters are constant, whether we can measure them 
or not, and may be scored as zero). Equations (12) then apply with 
k = 1 and with 3n always substituted for n. We may thus estimate 
the genetic correlation by pooling the four covariances within sire 
groups and using formula (la) or (1b), and formula (13) will then give 
an approximate estimate of its variance if we take f as the total number 
of female parents minus the number of sires and n as } the average 
number of progeny per female parent (the method of averaging n will 
be discussed in a later section). 

* Formulae (13) and (14) are expressed in terms of the population 
values of rg, rp , C and D, which will not normally be known; and the 
best we can do is to use estimates either from the progeny test in question 
or from other sources—better estimates of rp , C and D will often be 
available. Significance tests using the variance of rg will, of course, 
be very unreliable, since the variance is nearly proportional to (1 — ré), 
and may be appreciably affected by a small error in estimating rg . 
Moreover, the sampling distribution of rg is probably at least as skew 
as that of the product-moment correlation over most of its range, so 
that its variance, even if known accurately, would not be very helpful 
in setting confidence limits, unless we have an extremely large sample 
and an approach to normality. Nevertheless, an approximate variance 
for an estimate of rg is better than no guidance whatever as to its 
accuracy, and it will also enable us to calculate the size of progeny test 
necessary to give an estimate with a given level of accuracy. It is to be 
hoped that a study of the actual sampling distribution of rg will lead 
to a satisfactory method of setting confidence limits to a sampling 
estimate. 

We may note that, if rg is zero, assuming that rp is also small enough 
to be ignored (i.e. there is little or no environmental correlation between 
the two characters), then the sampling variance of rg reduces to 


V(re) = 
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Al Bi (15) 


where D, as before, is the harmonic mean of hj and h; , and will lie 
between | and 0. This formula may be compared with the variance of 
a product-moment correlation coefficient whose population value is 0, 
which is approximately 1/f for a sample of f pairs. 

Some allowance should be made for the fact that sample estimates 
have to be substituted for the various statistical parameters in calcu- 
lating formulae 13-15. For this purpose it seems best to take f as two 
less than the number of families. 


3. Effect of selection and assortative mating of parents. 


An alternative procedure will now be considered. Selection and 
assortative mating (i.e. picking out extreme + and — phenotypes and 
mating like with like) considerably reduces the sampling variances of 
the regression coefficients of progeny on parent, and does not alter the 
expected values of the regressions on the mid-parent value of the 
selected character, apart from possible bias due to non-additive genetic 
effects (Reeve 1953). We might, therefore, run two separate progeny 
tests, selecting and mating assortatively for character 1 in the first and 
for character 2 in the second. The regressions of b and y on a are 
estimated from the first test, and of b and y on z are estimated from 
the second. Using B to symbolise a regression coefficient, we then have: 


B(b-a) B(y-2) {ab}, {xy}, 
where suffixes 1 and 2 here indicate the test supplying the estimate. 


Proceeding as before, since covariances from the two tests are un- 
correlated, formula (11) reduces to: 


_ iby). 
[ab], -[ay], [xbj,- (17) 


where f/2 families are used in each test. 

Now assume that selection plus assortative mating has the average 
effect in the two tests of multiplying mid-parent variance of the selected 
character by (1 + L)—which can be estimated from the tests. Dealing 
with the case of full-sibs and mid-parent phenotypes only, formulae 
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(12) must be modified as follows (some of these formulae are derived in 
an appendix): 


[by] = + — + 
(BY)(1 + L) 
and here and in the formulae for [ab] and [ay] of (12), in test 1, we 
substitute: Q = hj and: 


B= [1+ — + + D) (48.2) 
Y = (1 + — + + L) 
while in calculating [by], [xy] and [xb] in test 2, we substitute Q = h} and: 
B = [1+ — + + L) 
Y = [1 + 3(n — + + L) 
Substituting in (17), we obtain 


Vive) = (¢: 55) 


The relative value ot this method depends on the magnitude of L. 
If the parents are not a selected sample but are mated assortatively, 
L is the correlation between mates, which can be made to approach 
unity. In an example in Drosophila, in which the most extreme 
(+ and —) half of the available parents were selected and mated 
assortatively, L was 1.74 (Reeve, 1953) and it should be possible to 
obtain values of L approaching 2, provided that the parents can be 
selected from a fairly large sample. , 

If n is fairly large (say 10 or more) we can compare the variances 
of the different estimates ignoring terms in 1/n. We then have, approxi- 
mately: for a single progeny test with f matings (random mating) 


(18.1) 


(18.3) 


Vira) 2f D + 1 Te (20) 
for two tests, each of }f matings with selection and assortative mating, 
2 
Virg) - (21) 
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D is the harmonic mean and C' the geometric mean of hi and h; , and 
(1 — rg — rpr¢/C) will generally be small compared with 1/D. If 
this is the case, two tests with selection and assortative mating of parents 
should give an estimate of rz with variance a little lower—in the pro- 
portion 2 : (1 + L)—than a single test with random selection and 
mating, using the same number of matings. 

The two tests will, of course, provide altogether 8 regressions, of 
which only 4 are used in (16). The 4 regressions from each test could be 
used to give a separate estimate, as described by Reeve (1953), and the 
two estimates could be averaged to obtain a final estimate of improved 
statistical accuracy, with a sampling variance roughly half that of (16). 
The difficulty about this estimate is that the amount of bias likely to 
arise from non-additive genetic effects is unknown, but probably larger 
than the bias in (16). 


4. Equality of variances for estimates (1a) and (1b) 
Taking the case of a single progeny test consisting of families of 
full- or half-sibs with their mid-parent or common parent phenotypes, 
which led to formula (13), we can differentiate (1b) directly, giving 


1 
drg = - {ay} Ea + 2d{xb} 


— (lay) + 4 


Squaring and taking expectations, 


1 


V(ra) = 16f{ab} {zy} [ sv tay + 4V{xb} + 8 Cov [{ay}, {xb}] 


of Viab} , Vizy} Cov [{ab}, 
ov [{ay}, fab}] + Cov [{xb}, {ab}] 
{ab} 


Cov [fay}, fay}] + Cov [{xb}, {xy}] 
{zy} )| 


It is now simplest to work in terms of covariances. From (9) we 
have the simple relations: 


+ 


(23) 


Cov {fab}, try}] = [{ax}- {by} + fay}-{br}] (24) 


from which, by making different variates identical, 


| 
+ (fay} + {xb}) ) 
- 
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Cov [{ab}, fay}] = 4 + {ab} fay} 
(25) 


V{ay} = + {ay}*], ete. 


We next require the equations similar to (12) relating the variances 
and covariances of the four variates to h, , h. etc. These are: 


, we 
a, (14% 


fab) = Shei, (ay) = 


(26) 
{ay} = 2 hhareoio. = {xb} 


{ax} = 
2 1%2 


where as before { } indicates a covariance, 0; and o; are the pheno- 
typic variances of the two characters among individuals in an unselected 
population, and k takes the values 1 and 2, respectively, for full-sib 
families with both parents, and }-sib families with one parent measured. 

Using (24), (25) and (26), we can express the c covariances appearing 
in (23) in terms of h’ etc., e.g. 


Cov [{ay}, {ab = ote] ( 1 | et 
[{ay}, {ab}] 7193) on Tp + ute |, Cc. 

Making the relevant substitutions and simplifying, we obtain 


n 


which is identical with (13). 


k 
= 2 
: k 
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In the same way it may be shown that the conditions of section 3 
lead to a variance identical with equation (19), so that for any of the 
cases considered in this paper we may assume that the same variance 
applies, whether we calculate rz by formula (la) or (1b). 


5. The effect of variable family size. 


One is fortunate in a progeny test if all families yield the same 
number of progeny, and the question arises of dealing with variable n. 
In such a case, the amount of information supplied by the family mean 
is not usually proportional to the number of progeny, so that the 
commonly used weighting factor n is not the correct one. Kempthorne 
and Tandon (1953) have calculated the proper weights to be used in 
computing the regression or covariance of progeny mean on parent so 
that it has minimum sampling variance; but I found this paper rather 
difficult to follow in parts, and it may, therefore, be of interest to derive 
these weights by a simpler procedure, which will make their function 
clearer. 

Suppose that we have families of n sibs, and that P, O and M are 
the mid-parent, individual progeny and progeny mean phenotypes of a 
single character, for any family. Let rop and ry, be the correlations of 
an individual progeny and their family mean with mid-parent pheno- 
type, and let roo be the intra-class correlation between sibs. 

Then, for the variances of the regression coefficients of O and M on 
P, evidently 
V(Bor) = — PY | (28) 
V(Bur) = ou(l — rv)/ (P — PY 


But ow = 06 [1 + (n — 1)roo]/n, and Cov (MP) = Cov (OP), so that 
om’ mp = , and we can write: 


V(Bur) = oo(1 — Too) (: ror) 


(P — \n 1 — roo 
2 
oo(1 Too) 

29 
w, (P — P) 

where ——, 
1+ nT (30) 

and T = Too — Yop 

foe 


Only the denominator of (29) changes with family size n, so that the 


= 
| |} 
a 
\ i 
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weight of a regression coefficient based on families of size n is proportional 
to 


W, = wv, >> (P — 


To obtain the combined regression coefficient with minimum variance 
for variable n, we must first choose a common estimate of the parent 
mean P, and then take a weighted average of the regression coefficients 
for each value of n, using the weight W,. The appropriate estimate 
of P is obviously the weighted estimate 


P (> wP)/>d w, 


summed over families, where P; is the mid-parent value of family i, 
and w;, is the value of w, , as defined in (30), for this family. Retaining 
subscripts n for families of n, and i for the i’th family, the average re- 
gression is 

B= > W.d./> W. = P? (31) 


These formulae apply equally if we are dealing with families of 
half-sibs correlated with their common parent P, and we then have 
the situation discussed by Kempthorne and Tandon (1953), in which 
Tor = Band roo = p, in their terminology. w, is their weighting factor 
for families of n; progeny, and (31) is identical with their weighted re- 
gression coefficient. We now see that this equation is simply the result 
of choosing a common parent mean P and weighting the regression 
coefficients for each value of n in proportion to the reciprocals of their 
variances. 

There are two cases to consider: 

(1) families of 3-sibs correlated with their common parent (as discussed 
by Kempthorne and Tandon) . 


Tor = Too = 4h’, B= hh’, 
(2) families of full sibs correlated with their mid-parent phenotype 
Tor = h?/V/2, Too = 3h’, B= h’, 


where £ is in each case the regression of O on P. It follows that: for 
case (1) 


_ — _ — 28) 
for case (2) 


7 _ 6-8) 


(33) 


2—h* 2-8 
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These results are only strictly valid if we can ignore any non-additive 
genetic variance and any environmental correlations between progeny 
of the same family. ° 

Kempthorne and Tandon point out that a value of 7 must be 
guessed in order to obtain the weighting factors w, , but they give no 
indication how such a guess should be made (whether by gazing at a 
crystal or at the data), and this point seems to need clarification. If 
there are no better indications from other sources, one might suggest 
the use of formula (32) or (33) above, 6 being estimated from the un- 
weighted regression of progeny on parent or mid-parent. In the example 
given by Kempthorne and Tandon, the unweighted estimate of 8 is 
0.136, so that formula (20) gives T = + 0.053, a value close to their 
guess of 7 = 0.04 (whose origin is not stated). 

The effect of using any set of weights on the variance of the genetic 
correlation (13) will depend on its effect on the value of n turning up in 
formulae (12). Without proof, we state the following approximate 
results, which may be obtained by considering the effects of the various 
systems of weights on the R.H.S. of formulae (12). 

Three sets of weights w; may be considered, where the covariances 
in (1) are calculated as z w,M(P; — P) in the terminology of the 
present section 


(1) All w; = 1, ie. all family means given equal weight, regardless of 
the number of progeny in the family, 


(2) w, = n; , i.e. progeny means weighted in proportion to number of 
progeny measured, 


(3) w, = n,/(1 + n, T), for minimum variance of regression coefficients. 


Then for n in formulae (13)-(15) and (19) we make the following 
substitutions: 


| = 


n; 


where > indicates summation over families and f is the number of 
families. (3) reduces to (1) and (2), respectively in (34), when w; is 
put equal to 1 and ton, . 

When using the weights w; = n,/(1 + n; T), the values of T(T, 
and 7',) may he different for the two characters. If 7, and T, do not 


) n= | 
Q) 
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differ much, they can be averaged to give a single set of weights (w,), 
without losing much information. If they differ enough to make 
appreciable differences in the weights for the *two characters, then 
separate sets of weights should be used, and n in equation (3) of (34) 
may be taken, approximately, as the geometric mean of the values of 
n calculated separately from the weights of each character. If separate 
sets of weights are used, in estimating the genetic correlation by equa- 
tion (1) the covariances should be calculated using the weight ap- 
propriate to the progeny variate in each case. 


6. Examples. 


In calculating the covariances from a progeny test, it may not be 
worth while to use Kempthorne and Tandon’s weighting factors w, , 
unless there is a good range of variation in progeny numbers. The 
genetic correlation may be estimated by either of formulae (1), and to 
estimate its variance we need only to apply formula (13) or the ap- 
propriate variation of it. For this purpose we need estimates of rp , 
the phenotypic correction between the two characters in individuals of 
an unselected population, and of h} and h; , the fractions of the pheno- 
typic variances of the two characters which can be attributed to 
additive genetic variations. These may be estimated from the regression 
coefficients in the progeny test, or from other sources, in the usual way. 
Then C = h,h, and D = 3(1/hi + 1/h3). 

As an example, tests on Drosophila melanogaster (Reeve and Robert- 
son, 1953 and unpublished) show that in typical wild stocks wing and 
thorax length both have heritabilities of about 0.3, so that C = D = 0.3, 
and their phenotypic correlation is about 0.8, while the genetic corre- 
lation was estimated as 0.75. 

Accepting these figures, in a progeny test with random mating, 
having f matings and n progeny per mating, formula (13) gives the 
variance of r, as 0.4/f + 1.5/nf. To obtain a standard error of + 0.1 
we need about 200 matings if n = 1, and about 60 matings if n = 10; 
and even if n is very large we still need about 40 matings to give the 
same accuracy. 

In this case r,, is rather large, but as it declines the variance in- 
creases roughly in proportion to (1 — r¢). Thus if both rg and rp are 
so small that they can be neglected in (13), and hj and h;3 are still both 
0.3, the same progeny test gives V(rz) = (2.1/f) + (10/nf). This 
variance is over 6 times as great as before, so that we should now need 
1200 matings with n = 1, and over 200 matings with n large, in order 
to obtain a standard error of + 0.1. 


< 
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7. Summary. 


A formula is developed for the variance in large samples of the 
genetic correlation coefficient between two characters, estimated from 
the four parent-offspring covariances or correlations for the two charac- 
ters. ‘The variance is expressed in terms of the population values of the 
genetic and phenotypic correlations between the two characters and 
their heritabilities, and may be applied to progeny tests in which full- 
sib families and mid-parent values, or half-sib families and their common 
parent’s phenotypes are measured. It can also be used for sex-limited 
characters or when covariances are calculated ‘“within-sires”. A 
modified formula is derived for the case of two progeny tests, one using 
selection and assortative mating for each character. The adjustments 
necessary when there are variable numbers of progeny per family are 
discussed, and an example illustrates the size of test necessary to 
estimate the genetic correlation with given accuracy, under various 
conditions. 

It is shown that the variance is the same, whether the arithmetic 
mean or the geometric mean of the two covariances involving both 
characters is used in calculating the genetic correlation. 


The limitations to be placed on the use of the variance formulae are 
discussed. 


8. Appendiz: selection and assortative mating. 


On the assumption that we are dealing with strictly additive genetic 
effects, in a progeny test involving pair matings and full-sib progeny 
families, selection and assortative mating of parents for a given character 
both have the same effects on the progeny parameters, these effects 
depending simply on the factor (1 + 1) by which mid-parent variance 
is thereby multiplied (Reeve 1953). 

Consider first the case of assortative mating of unselected parents, 
so as to introduce a phenotypic correlation L between mates for character 
1. The path coefficient relationships between parents and offspring 
for two characters (subscripts 1 and 2), under these conditions, are 
shown in fig. 1. Primes indicate the parameters of the progeny genera- 
tion, and otherwise the symbols have the notation used by Reeve 
(1953). It will be recalled that s is the additional correlation between 
the gametic genotypes of the two characters when these are carried in 
the same gamete as distinct from sister gametes. For unselected 
parents, s = }3rq , and ensures that the genetic correlation in zygotes 
remains constant from generation to generation, under random mating. 

F, and F, are the correlations between uniting gametes for the two 


L 
| 
| 
~ | 
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characters, and are shown by dotted lines, since they are already implicit 
in the paths of the parents and must not be counted twice. 
Fig. 1 gives immediately the following relationships: 


} (35) 
F, = $Lhirg 
and for each character separately: 
a’ = 1/2(1 + F) 
ow = (1+ 
op = (1 + Fh’)o; 
h”? = + F)/(1 + Fh’) 
ah” = h?/2(1 + Fh’) - 


Let Cov (P’) and r; be the phenotypic covariance and correlation 
between characters 1 and 2 in a single progeny, and Cov (12’) and rj, 


v 


(36) 


FIG. 1. EFFECT OF ASSORTATIVE MATING FOR ONE CHARACTER ON RELATIONS 
OF PARENTS AND OFFSPRING FOR TWO CORRELATED CHARACTERS, 


the covariance and correlation between characters 1 in one progeny and 
2 in its sib of the same sex. Let E’ and Cov (E’) be the environmental 
components of r7 and Cov (P’). We assume no environmental correla- 
tion between sibs. Then, from fig. 1: 


® 
( 
t 
e 
e 
ti 
b 
e 
| f 
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re = + dre + + (37) 


+ 


The 
In terms of covariances, after substituting from (36), these become: 


Cov (P’) = (rp + haa 
Cov (12’) = 


(38) 


where all the parameters on the R.H.S. refer to the random-mating 
population. In obtaining the first equation of (38), we note that non- 
random mating of parents does not affect Cov (’), so that 


Cov (k’) = (rp 


For the covariance {by} between the means of n sib progeny of the 
two characters, we obtain: 


{by} 


* [Cov (P’) + = 1) Cov 


1 3 
[rp + 3(n — + (39) 


By an easy extension of fig. 1 it may be seen that, for two sibs and a 
single character we have 


o} = (1 + | (40) 
Cov (P’P’) = + 
For the family mean of n sib progeny (P’) these equations yield 
[1 + — + (41) 


On interchanging subscripts 1 and 2 in the above formulae we obtain 
the corresponding results for assortative mating of character 2. 

Since assortative mating and selection of parents have the same 
effect on the progeny parameters, equations (39) and (41) apply to 
either system or to the combined effects of both, provided that their 
total effect is to multiply mid-parent variance for the character selected 
by (1 + L). Thus, on putting in the appropriate subscripts, the above 
equations give the progeny variance and covariance formulae needed 
for deriving equations (18). 

As a corollary, we may deduce that selection of parents so as to 
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multiply the mid-parent variance of character 1 by (1 + K) converts 
the parameter s of fig. 1 into: 


8 = + + (42) 


It should be emphasised that the equations referring to non-random 
mating are only strictly valid when all the genetic variance is additive 
and there are no environmental correlations between parent and off- 
spring (the first of these conditions is, no doubt, rarely satisfied in 
practice). The amount o’ ‘turbance to be expected with any departure 
from these conditions is © 
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TESTS FOR LINEAR TRENDS IN PROPORTIONS AND 
FREQUENCIES 


P. ARMITAGE 
Statistical Research Unit of the Medical Research Council, 
London School of Hygiene and Tropical Medicine 


1. Introduction 


One frequently encounters data consisting of a series of proportions, 
occurring in groups which fall into some natural order. The question 
usually asked is then not so much whether the proportions differ 
significantly, but whether they show a significant trend, upwards or 
downwards, with the ordering of the groups. In the data shown in 
Table 1, for instance, the usual test for a 2 X 3 contingency table 
yields a x” equal to 7.89 on 2 degrees of freedom, corresponding to a 
probability of about 0.02. But this calculation takes no account of the 
fact that the carrier rate increases with the tonsil size, and it is reason- 
able to believe that a test specifically designed to detect a trend in the 
carrier rate as the tonsil size increases would show a much higher 
degree of significance. 


TABLE 1 
Relationship between nasal carrier rate for Streptococcus pyogenes and size of tonsils, 
among 1398 children aged 0-15 years. (Data from Drs. M. C. Holmes and R. E. O. 
Williams, summarised by Holmes and Williams, 1954) 


Present, but Enlarged tonsils 
not enlarged r Total 
+ ++ +++ 
Carriers 19 29 24 72 
Non-carriers 497 560 269 1326 
516 589 293 1398 
Carrier-rate 0.0368 0.0492 0.0819 


No originality is claimed for the tests discussed in this paper. They 
will be familiar to many statisticians, and may be derived as particular 
cases of procedures already published for contingency tables with any 
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number of rows and columns. Since the situation in which one of the 
classifications in a contingency table is a dichotomy (so that the data 
form a series of proportions) occurs so frequently, it is hoped that an 
explicit discussion of this case may be of interest. 

We shall regard the data as forming a 2 X k contingency table, and 
use the following notation: 


Column 
1 2 3 tee k Total 
Row 1 Ne N3 Ny t 
Row 2 Nz — N; — Ni- T-t 
N, N; N, 


The proportion of individuals in the i-th column, which fall into 
the first row, is denoted by p; = n,/N; , and the overall proportion is 
P = t/T. In summations (which are always over the k columns), we 
shall omit the suffix i. Thus, Nz will denote N.z; . 


2. A test based on scores 


To measure and test the significance of the trend in the p, , a natural 
procedure is to allot a score x; to the i-th column (zx, < 2, < --- < %), 
and to perform some sort of regression analysis of p on z. In addition 
to the column scores z; , let us allot to each of the 7’ observations a row 
score, y, taking the values y = 1 for each observation in Row 1, and 
y = 0 for Row 2. Then the mean value of y for the 7-th column is 
clearly n;/N; = p; , and the overall mean of y is t/T = P. Thus, a 
regression analysis of y on x will be equivalent to one of p on x (p, 
being weighted in proportion to N;). The 7 values of y could now be 
subjected to a formal analysis of variance, between and within columns, 
as follows: 


Degrees of Sum of 
freedom squares 
Between columns 
Due to linear regression 1 8, 
Departures from linearity k-—2 S, 
S, + 8S, 
Within columns T—k S; 


Total T-1 S, + S, + S; 


L 


V 
7 
t 
I 


| 


‘ 

a 
a 
f 
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where S, = {>> — M(x (1) 
Si + 8. = N(p 
S; = > 
S,+ S. + S, = TP(1 — P), (2) 
and z= Nz/T. 


Consider first the problem of testing for general heterogeneity 
between columns. As in the usual model for the analysis of variance, 
we assume that in repeated sampling the column totals N; are fixed. 
The null hypothesis is that the expected value of y (and hence of p,) is 
the same for all columns. The usual analysis of variance test is to 
calculate the variance ratio {(S, + S,)/(k — 1)}/{S8;/(T — k)}. 
However, with a variate such as y, taking only the values 0 or 1, the 
normal theory is strictly valid only for large samples, and in these 
circumstances a number of alternative approximate tests are available. 
In particular the usual formula for x’ on k — 1 degrees of freedom can 
be expressed as 


(S, + S2)/{(S: + S2 + S;)/T}. (3) 


Here the denominator is taken from the “Total’’ row in the analysis 
of variance table, but with the divisor T instead of the total degrees of 
freedom T — 1. In all these alternative tests, the tabulated x’ distri- 
bution is strictly valid only asymptotically for large sample sizes, and 
the tests become equivalent as the N, increase, provided that the null 
hypothesis is true. 

Similarly, to test the significance of the regression, the usual analysis 
of variance procedure would be to compare S, with S, (or perhaps with 
S, + SS, if we ignored the possibility of departures from linearity). 
An alternative test, equivalent in large samples if the null hypothesis 
is true, is to calculate 


xo S,/{(S, + 8S, + S;)/T}, (4) 


which is distributed approximately as x’ on 1 degree of freedom. If 
we wish to calculate confidence limits for the regression coefficient, 
assuming that the true value might differ from zero, we should use 
S,/(T — k) as an estimate of variance rather than (S, + S, + S,)/T. 

Which of the various alternative criteria: follows most closely its 
assumed sampling distribution, for small samples, is a matter for 
further study; (see the Appendix, §6). In the meantime, there seems 
little objection to the use of (4). This criterion is equivalent to that 


| 
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proposed by Yates (1948) for contingency tables with any number of 
rows and columns. For k = 2, it is equivalent to the usual x’ criterion 
for 2 X 2 tables (without continuity correction). For k > 2, a com- 
parison of (3) and (4) shows that x¢ is a part of the total x’, the differ- 
ence between the two values representing departures from linearity, 
and having k — 2 df. 

Denoting by b the estimated regression coefficient of y on x, and 
by V(b) the estimated* sampling variance of b on the null hypothesis, 
we find that 


TS 
_ — 2) 
and, from (1), (2) and (4), ; 
_ T{T — t Nx}? 
UT — Nx — (>, 


on | degree of freedom. 

The calculations cannot be performed until the scores z; have been 
chosen. In the absence of any a priori knowledge of the type of trend 
to be expected, it seems reasonable to choose the z; to be equally-spaced, 
and it will often be convenient to have them centred around zero. This 
is the procedure advocated by Yates. Thus, for k columns, we should 
choose z, = — — 1), = — — 3), --+ , = — 1). The 
choice of scores is discussed further in a later section. It should be 
emphasized that, whatever scoring system is chosen, the validity of 
the significance test is not affected; that is, if the null hypothesis is — 
true, a value of x5 significant at the a% level will occur only about a 
times out of 100. 

As an example, using the data of Table 1, we shall allot equally- 
spaced scores as follows: z, = — 1, x, = 0,2; = 1. We obtain 


= —223, Nz’ = 809, 
whence, frem (5), (6) and (7), 


(5) 


V(b) = (6) 


(7) 


b = 0.02131, 
V(b) = 0.000063 160; V V(b) = 0.00795, 
and xs = 7.19 on 1df. (P = 0.007). 


*In repeated sampling with both sets of marginal totals fixed, the expression (6) is (7 — 1)/T 


times the ezact variance of b. This can be shown from results given by Haldane (1940). 
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The test for trend indicates, as expected, a considerably higher degree 
of significance than the total x’ of 7.89 on 2 d.f. The test for departures 
from linear regression gives x” = 7.89 — 7.19 = 0.70 on 1 d.f., which is 
non-significant. In this particular example, the association between 
carrier rate and tonsil size may be due to the association of both factors 
with the age or social class of the child. 

Yates (1948) points out that the same formula for x5 is obtained 
whether one considers the regression of row score on column score, or 
that of column score on row score. Now, when there are only two rows, 
a test for the regression of column score on row score is equivalent to a 
test for the difference between the mean column score for the first row 
and that for the second row. For some types of data, particularly 
where the row totals are fixed beforehand, it will be more natural to 
think of the x¢ test in this way, rather than in terms of the regression 
of pon x. In the data shown in Table 2, for instance, the row totals, 
32 and 32, were fixed by the experimental design, and it seems more 
natural to ask whether the mean scores in the two treatment groups 
differ significantly, rather than whether the proportion of patients in 
group A, in each column, shows a linear trend with the score. In this 
example, the total x? = 5.91 on 3 df. (P = 0.12), whereas x5 = 5.26 
on 1 df. (P = 0.02), showing a fairly definite improvement in group 
A as compared with group B. 


TABLE 2 


Changes in size of ulcer crater, 3 months after start of treatment, for patients in two 
treatment groups (From Table IV of Doll and Pygott, 1952) 


Number of cases with crater 
Treatment r AS ~ Total 
group Larger Less than | 2/3 or more Healed 
2/3 healed healed 

A 6 4 10 12 32 

B 11 8 8 5 32 

17 12 18 17 b+ 
Score, 2; -1.5 —0.5 +0.5 +1.5 


A test criterion exactly equivalent to x5 has been used in genetical 


applications by Fisher and Ford (1947, p. 163) and by Holt (1948, p. 
148). A recent example of the use of this test, in a 2 X 3 table, is given 
by Griineberg (1955). He compares the proportions of animals in two 
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stocks which show some effect on 0, 1 or 2 sides of the body. The 
formula for x’ given by C. A. B. Smith in the Appendix to Griineberg’s 
paper is equivalent to our (7). The more general problem in which 
more than two stocks are compared could be treated by Yates’s methods. 


3. Trends in frequencies 


If P = t/” is very small, we may substitute 7/(7 — t) ~ 1 in (7). 
Defining e; = tN;,/T, the “expected” frequency corresponding to the 
observed frequency n; , we find from (7) that 


{dD 
Xo (>: ex) /t 


The numerator of (8) is the square of the cross-product, U, of the 
scores z; with the discrepancies n; — e; . The denominator is equal to 
dee (x — 2)’, where = = Yex/t, ie. a weighted sum of squares of the 
z; about their mean, the weights being the expected numbers. The 
test is thus based entirely on the frequencies in the first row, and is 
clearly valid only when the sampling errors of the frequencies in the 
second row are relatively negligible. The frequencies in the first row 
may be thought of as those occurring in a sample of size ¢ from a multi- 
nomial distribution. The denominator of (8) is then obtained directly 
as the variance of U in repeated sampling, with ¢ and the expected 
frequencies e; kept constant. The expression (8) may thus be written 
as x5 = 

In Table 3, the expected frequencies e; , have been obtained by 
sub-dividing the total number of maternal deaths, 127, in proportion 
to the number of mothers at risk during each of the eight periods. 
The last line of Table 3 suggests, perhaps, a slight tendency for the 
maternal mortality rates to fall. The scores, z; , have been taken as 
the mid-points of the different periods, minus 1900. The total ’, 
calculated from the observed and expected frequencies is 3.91 on 7 d.f. 
(P = 0.79); even if the whole of this quantity were ascribed to regression 
it would barely reach the 5% level of significance on 1.d.f. In fact, 
application of (8) gives x5 = 1.27 on 1 df. (P = 0.26). The data, 
therefore, do not provide any evidence for a gradual decline in maternal 
mortality amongst women of this particular parity and age-group. 


4. Kendall’s rank correlation test 


An alternative approach to data like those in Tables 1 and 2 is to 
apply rank correlation methods (Kendall, 1948; Stuart, 1953). In 
Table 1, for instance, we could regard the 1398 children as being ranked 
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TABLE 3 


Maternal mortality in New South Wales, for primiparae aged 40 and over. (From 
Tables I, II and III of Wilcocks and Lancaster, 1951) 
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1894-1900 | 1901-1907 1908-1910 | 1911-1920 

EF —2.5 4.5 9.5 16.0 
Number of mothers, N; 346 454 272 1133 
Deaths 

Observed, n; “10 9 3 23 

Expected, ¢; 6.603 8.664 5.191 21.621 
Maternal mortality rate, 

per 1,000 28.90 19.82 11.03 20.30 

1921-1930 | 1931-1937 | 1938-1942 | 1943-1948| Total 

Zs 26.0 34.5 40.5 46.0 
Number of mothers, N; | 1546 909 699 1296 6655 
Deaths 

Observed, n; 32 17 13 20 127 

Expected, ¢; 29.503 17.347 13.339 24.732 | 127.000 
Maternal mortality rate, 

per 1,000 20.70 18.70 18.60 15.43 


in two ways. In the first ranking (corresponding to the rows of Table 
1), 1326 individuals are tied with a rank of (1 + 1326)/2 = 663.5, and 
the remaining 72 are tied with a rank of 1326 + (1 + 72)/2 = 1362.5. 
In the second ranking (for columns), 516 are tied with a rank of 
(1 + 516)/2 = 258.5, 589 are tied with a rank of 516 + (1 + 589)/2 = 
811.0, and 293 are tied with a rank of 516 + 589 + (1 + 293)/2 = 
1252.0. To test for a tendency for the carrier-rate to increase or de- 
crease with tonsil size, we could apply the usual techniques of rank 
correlation, making allowance for the considerable number of ties. 

To calculate Kendall’s statistic, S (§1.9 of his book), we form the 
sum of products of each frequency in the second row with the frequencies 
above and to the right of it, and subtract the sum of products of each 
frequency in the first row with those below and to the right of it. Thus, 
in the notation previously used: 


S = (Ni — + ms + +m) + (No — + +m) 
+ + — — 1,{(N2 — m) + + (M. — m)} 


—m_(N, — m). (9) 
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When the null hypothesis is true (i.e. there is no association), the 
variance of S is (writing Kendall’s (4.5) in the present notation), 
uT — 2) _ 3 
V9) = (10) 
(Stuart (1953) considers inequalities for the variance when the null 
hypothesis is not true.) A test for association is thus provided by 


xi = S’*/V(S) on 1d. (11) 


If k = 2, xi is equal to (7 — 1)/T times the usual x’ for a2 X 2 table 
(without continuity correction). This factor is of no great importance, 
in view of the asymptotic nature of the assumed x’ distribution. 

At first sight the approach of §2 seems to bear little relationship to 
that of the present section. In point of fact the two methods are 
quite closely related. It is known (Hemelrijk, 1952) that when one of 
the classifications in a rank correlation table is a dichotomy, Kendall’s 
test based on S is equivalent to Wilcoxon’s test for the sum of the 
ranks in one of the sub-groups (see Kruskal and Wallis, 1952, for 
references). This, in turn, is equivalent to a test for the difference 
between the mean ranks in the two sub-groups, since the overall sum of 
ranks is constant. This difference would be the same as the difference 
in mean column scores, discussed in §1, if we chose the score for each 
column to be equal to the mid-rank for that column. Thus, we should 
have z, = (1 + N,)/2, x = (1 + 2N, + N,)/2, 2; = (1 + 2N, + 
2N. + N;)/2, ete. It would, therefore, not be surprising if the x? test 
were closely related to the x5 test with the z,; chosen in this way, or at 
least chosen so as to be linearly related to these values. It is not 
difficult to show directly that this is so. 

Rearranging the terms in (9), and writing p; = n;/N; , we find that 


S= > nz, (12) 
= (1+ 2N,+ --- —(T4+D). (13) 


These scores z; are linearly related to the mid-ranks given above, and 
it can easily be verified that 


Nx =0 > = — N’)/3. (14) 
Hence, from (7) and (14), 
2 37° S’ 


= {T/(T — 1)}xi, _ from (10) and (11) (15) 
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When the N; are equal, the xz; are clearly equally-spaced. The 
rank correlation test is then equivalent to the regression test with 
equally-spaced scores, except for the factor 7/(7’ — 1) in (15). As 
already stated, this factor is asymptotically unimportant. The tests 
would have been exactly equivalent if, in the formula (4) from which 
(7) is derived, the total degrees of freedom, 7 — 1, had been used as a 
divisor in the denominator, instead of 7’. As we have seen, when k = 2, 
xs agrees with the usual x’ for a 2 X 2 table, whereas xi differs from 
it by a factor (7 — 1)/T. 

As examples of the rank correlation test, formulae (9)—(11) have 
been applied to the data shown in Tables 1 and 2. For Table 1, 


S = 16229,  V(S) = 38,543,560.2, 
and xi = 6.83 (P = 0.009), 
as compared with x5 = 7.19. For Table 2, 
S = 330 V(S) = 20720.25 
xi = 5.26 (P = 0.02), 


as compared with x5 = 5.26 (the exact agreement being coincidental). 


5. Choice of test 


Since the rank correlation test has been shown to be equivalent 
(apart from the factor 7'/(7' — 1)) to the regression test, with a particular 
choice of scores depending on the N; , the decision whether to use x; or 
xo reduces to a choice of the most suitable system of scoring. In most 
situations there will be no prior reason to expect any particular type 
of relationship, and it is difficult to formulate any general advice. 

If the columns are defined by a measurement, like age, it will often 
be reasonable to choose scores linearly related to the values assumed 
by the measurement, taking mid-points of groups where necessary (as 
in Table 3). 

If the columns are defined by a qualitative classification as in 
Table 1, the choice is more arbitrary. If the problem is primarily 
thought of as a trend in proportions in well-defined ordered groups, 
the regression method with equally-spaced x; seems the most appropri- 
ate. An estimate is obtained of the mean change in p; from group 
to group, and one avoids the use of scores depending on the N; which 
may be difficult to interpret. If, on the other hand, the grouping by 
columns is arbitrary, there may be little virtue in using equally-spaced 
x; , and the rank correlation method is perhaps the more objective. 
Fortunately, the two tests will usually give fairly close results. 
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It may be of interest to conclude with a historical note. The reader 
will find a number of sets of data suitable for analysis by the methods 
outlined here, in two papers by Karl Pearson (1909, 1910). In the first 
paper, Pearson considered situations in which the columns corresponded 
to a numerical variate; the rows were assumed to represent a dichotomy 
of an underlying normal variate and the method provided an estimate 
of the hypothetical correlation coefficient (sometimes referred to as 
“‘biserial r”’). In the second paper, the method was extended for data 
in which the columns were qualitatively defined, but might still be 
ordered; an estimate of the hypothetical correlation ratio (“biserial 
n’’), not dependent on the ordering, was obtained, and the trend was 
assessed by inspection. These methods have largely fallen into disuse, 
partly because of difficulties in determining the sampling errors of the 
coefficients, and partly because the existence of a normal variate 
underlying the dichotomy by rows was not generally accepted. 


6. Appendix. Exact distribution of xi on the null hypothesis 


The exact distribution of x¢ has been determined, by enumeration 
of all possible results, for the case where k = 3, N, = N, = N; = 10, 
and the n,; each follow the binomial distribution ( + }4)"°. The 
probabilities with which x> exceeds various tabulated percentiles of the 
x’ distribution on 1 d.f. are shown in Table 4. This table also shows 
the cumulative distribution of an alternative test criterion, 


x2 = 8,/{(S2 + S;)/(T k)}, 


in the notation of §2. The formula for x2 differs from that for xi , (4), 
in having as denominator the mean square about regression. The 
two test criteria are connected by the relationship 


x2 = (T — 2)xo/(T — x0), 
= 28x0/(30 — 


since T = 30. 

Although the expected frequencies in this example are as low as 5, 
Table 4 shows (a) that there is little to choose between the two tests 
up to about the 5% level of significance, and (b) that the distribution 
of either test criterion agrees well with the theoretical x’ distribution 
between the 50% and 5% points. The appreciable discrepancy at 
the lower end of each distribution is due to there being a probability 
of 0.176 that x5 = x3 = 0. It would be dangerous to generalize from 
this example alone, but the results are at least encouraging. 
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TABLE 4 
Cumulative distributions of two alternative test criteria, in case described in text 


Cumulative probability 

Values of | - 

x? Tabulated xe x2 

0- 1.000 1.000 1.000 
0.0#157- 0.990 0.824 0.824 
0.0°628- 0.980 0.824 0.824 
0.07293- 0.950 0.824 0.824 
0.0158- 0.900 0.824 0.824 
0.0642- 0.800 0.824 0.824 
0.148- 0.700 0.824 0.824 
0.455- 0.500 0.504 0.504 
1.074- 0.300 0.264 0.264 
1.642- 0.200 0.263 0.263 
2.706- 0.100 0.116 0.116 
3.841- 0.050 0.042 0.044 
5.412- 0.020 0.014 0.042 
6.635- 0.010 0.012 0.013 
10.827- 0.0010 0.0005 0.0028 


I am indebted to Professor A. Bradford Hill and Dr. J. O. Irwin 
for commenting on the first draft of this paper; to Dr. M. C. Holmes 
and Dr. R. E. O. Williams for permission to quote, in Table 1, details 
not appearing in their paper; and to Miss Irene Allen for computing 
assistance. 

Since this paper was accepted for publication, the regression test 
based on x5 has been discussed by W. G. Cochran (1954), Biometrics 
10: 417-451 §§6.2, 6.3. 
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AN EXAMPLE OF THE TRUNCATED POISSON 
DISTRIBUTION 


TD. J. Finney* anp G. C. VaRLEY 
University of Oxford 


1. The problem and the data 


The female knapweed gall-fly, Urophora jaceana, lays its eggs in 
batches within unopened flower-heads of black knapweed, Centaurea 
nemoralis. The eggs are easily seen and counted when the flower-head 
is split open. The second instar larva hatches from the egg, moves a 
short distance within the flower-head, and produces a hard gall-cell 
in which occur all further stages of development up to emergence of 
the adult fly. 

As part of his intensive study of population balance in the gall-fly, 
Varley (1947, pp. 158-161) wished to estimate the total mortality 
that occurs between oviposition and gall formation. He had records 
from samples of flower-heads in 1935 and 1936, first for numbers of 
eggs per flower-head and at a later date for numbers of gall-cells per 
flower-head. The sampling procedure is not under discussion here. 
Table 1 contains the results, each sample relating to different flower- 
heads since the process of counting is destructive; the number of ‘empty’ 
flower-heads is omitted because a multitude of causes not relevant to 
the investigation may secure that no eggs are laid. Some eggs or 
larvae will fail to produce gall-cells, but each gall-cell observed corre- 
sponds to only one egg. 

These data will be used here as examples of the utility of the truncated 
Poisson distribution, which has recently been the subject of several 
papers. Analysis with the aid of that distribution leads to some modi- 
fication of Varley’s previous conclusions from the data, at least in 
respect of the strength of evidence for the occurrence of competition 
within the flower-head. 


2. The earlier analysis 


Varley inquired whether the mortality between oviposition and 
gall formation was independent of the number of eggs per flower-head, 
his expectation being that relatively more deaths would occur in the 
heavily populated flower-heads. Suppose that failure of eggs to produce 
gall-cells occurs entirely randomly in a proportion @ of eggs. Then, 
of flower-heads that initially have x eggs, the proportion that later 


*Now at the University of Aberdeen. 
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TABLE 1 
Observed frequencies of eggs and gall-cells 


No. of eggs Frequency of flower-heads 
or 
gall-cells 1935 1936 
Eggs Gall-cells Eggs Gall-cells 
] 29 287 22 90 
2 38 272 18 96 
3 36 196 18 57 
4 23 79 ll 26 
5 8 29 9 10 
6 5 20 6 4 
7 5 2 3 5 
8 2 0 0 0 
9 1 1 1 1 
10 0 0 0 0 
1l 0 0 0 0 
12 1 0 0 0 
>12 0 0 0 0 
Total 148 886 88 289 


contain y gall-cells will be determined by the binomial distribution as 


Using the observed egg distribution from Table 1, Varley calculated 
the gall-cell distribution for various trial values of @ and compared 
these with the observed distribution in the same year by means of x’. 
He estimated 6 as that value which minimized x’, and obtained a variance 
for the estimate from the rate of change of x’ in the neighbourhood of 
the minimum. His estimates were 0.289 + 0.022 for 1935 and 0.323 + 
0.035 for 1936. In 1935, his minimum ,’ was sufficiently large to suggest 
that the mortality was not operating at random but that possibly it 
was higher for the flower-heads with many eggs than for those with 
few. 

The minimization of x’ is well-known to be a fully efficient procedure 
for estimating a parameter from largé samples. In the present problem, 
however, it fails to take account of the fact that the distribution of 
eggs used in calculating the expect« frequencies for the cell distribution 
is itself based upon a sample. If these expected frequencies were 
expressed as functions solely involving unknown parameters, or if the 
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observations on gall-cells had been made at a later date on the same 
flower-heads as were used for the egg records,* the method would be 
appropriate. In the particular circumstances of Varley’s data, though 
the minimization of x’ is still a valid method of estimating the mortality 
rate, the values of x’ will be increased by neglect of the sampling varia- 
tion of the egg distribution. Consequently, evidence for heterogeneity 
of mortality will be exaggerated and the standard error of the estimate 
of 6, obtained from consideration of the rate of change of x’, will be 
biased downwards. If the number of flower-heads used for the egg 
records were much larger than that for gall-cells, sampling errors in the 
egg distribution could safely be ignored; for both years, however, the 
egg sample was substantially smaller than the gall-cell sample. A 
process complementary to that used by Varley, namely taking the 
gall-cell distribution as fixed by the sample and calculating x’ by 
comparison of the egg distribution with a theoretical one giving the 
right gall-cell distribution ought to give more trustworthy results when 
the latter is based on so much the larger sample, but this would introduce 
new difficulties because some flower-heads containing eggs will yield 
no gall-cells and because no egg distribution could be found to give 
exactly a specified gall-cell sample as expected frequencies. A more 
satisfactory method is to treat the two samples similarly, first estimating 
comparable mean numbers of eggs and gall-cells and then estimating 
the survival rate, (1 — 6), as the ratio of these. 


3. The truncated Poisson 


The mean numbers of eggs and gall-cells per flower-head taken 
directly from Table 1 are not comparable, on account of the omission 
of zeros: some flower-heads containing eggs will produce no gall-cells, 
so that the ratio of these means would tend to overestimate the survival 
rate. Further progress seems to demand the use of a reasonably simple 
parametric model of the situation. The frequency distribution of z, 
the number of eggs per flower-head, in the two samples reported in 
Table 1, has the appearance of a Poisson distribution truncated by the 
absence of observations for x = 0, and it is of interest to see whether 
the data can be adequately described in this way. The most obvious 
suggestion, that a Poisson distribution is generated ‘by the random 
deposition of single eggs laid on such flowers as are at the right stage of 
development, is untenable here since it is known that a female normally 
lays several eggs at a time and that one flower-head rarely receives eggs 
from more than one female; the distribution observed must therefore 


*In this investigation, it was not possible to count eggs without destroying the flower-head, but in 
analogous studies of other organisms such an observational programme might be adopted.! 
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approximate closely to that of the size of egg batches. Nevertheless, 
the Poisson model will be shown to operate very satisfactorily. 

If the distribution of z is Poisson with mean ) and if mortality 
occurs randomly and independently of zx, then the distribution of y, 
the number of gall-cells per flower-head, is easily shown to be also 
Poisson, the mean being 


w= — 8). (1) 
Now for a Poisson distribution with the first term omitted, 
P(z) = zi — 1) @) 


The maximum likelihood estimate of \ is obtained (David & Johnson, 
1952) by equating the mean value of x to its expectation, and so is 4, 
the solution of 

—e’): (3) 


Rider (1953) independently obtained the same equation and provided 
a brief table to help in evaluating 1. Moreover, by the usual maximum 
likelihood procedure, the variance of \ is found to be asymptotically 
of the form 

Naa +1- (4) 
where N is the number of observations on x from which Z is formed. 
Cohen (1954) has obtained more general formulae, of which (3) and 
(4) are special cases. Equations similar to (2), (3), (4), with y, w in 
place of z, are also required. 

David and Johnson remarked on the impossibility of obtaining an 
explicit expression for \ as a function of Z, with the implication that 
this is a serious disadvantage of the method of estimation. In practice, 
equation (3) can be solved rapidly by iterative or interpolatory pro- 
cesses and a table for direct reading of } and NV(i) as functions of < 
could easily be constructed. 

Table 2 summarizes the results of applying these estimation pro- 
cedures to the four distributions in Table 1, and also shows x’values 
based upon comparison of the observed frequencies with those calculated 
from insertion of the estimated parameter in equation (2). The x’ 
values for the eggs give no sign of appreciable deviation from the 
hypothesis of Poisson distributions. The fact that the gall-cell dis- 
tributions also yield low values of x” shows the hypothesis that egg 
mortality is independent of x to be not contradicted by the data to 
any appreciable extent. Both \ and f agree remarkably closely in the 
two years. 
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TABLE 2 
Summary of estimates of Poisson parameters, variances, and homogeneity tests 
1935 1936 

3.020 3.034 

2.845 2.860 

VQ) 0.0220 0.0371 

x? 4.17 (4 df.) 6.83 (4 df.) 
2.283 2.336 

A 1.962 2.028 

V(A) 0.0028 0.0088 

x? 6.89 (4 df.) 4.87 (4 df.) 


In the calculation of x2, frequencies of flower-heads with 6 or more eggs or gall-cells were combined. 


From equation (1), the maximum likelihood estimate of the mortality 
rate is 


6=1 (5) 


Here, as in most practical situations, \ is much larger than its standard 
error, and the asymptotic variance of 6 can be safely used: 


V(6) = [V(a) + (1 — 6° + ¥. (6) 
Hence, in 1935 
6 = 0.310 + 0.040 
and in 1936 
6 = 0.291 + 0.058. 


The estimates are very close to Varley’s, but the standard errors are 
larger because of the allowance that has now been made for sampling 
errors in the distribution of numbers of eggs. 


4. Plackett’s Method 


An elegant alternative to maximum likelihood estimation for the 


parameter of a truncated Poisson distribution is due to Plackett (1953). 
He showed that 


(7) 


z=2 


n, being the number of observations for the value x, is an unbiased 
estimator of \ whose efficiency never falls below 95%. The estimator 


= 


392 BIOMETRICS, SEPTEMBER 1955 
may alternatively be written 


(7.1) 


He further showed that 
V(A*) = (NA* + 2n.)/N? (8) 


is an unbiased estimator of the variance of \*._ The numerical estimates 
of \ and u» obtained in this way are almost the same as those in Table 
3, and the mortality rates are estimated as 


6* = 0.306 + 0.042 


in 1935 and 
6* = 0.273 + 0.061 


in 1936. Thus estimates with only slightly larger standard errors, and 
standard errors that are now known to be valid even in small samples, 
are obtained with much less labour. 


5. Discussion 


Varley’s observations on the numbers of eggs per flower-head are 
in close agreement with what would be expected if the number of eggs 
per flower-head followed a Poisson distribution and the proportion of 
eggs failing to produce gall-cells were independent of the number of 
eggs per flower-head. This is no demonstration that competition 
within the flower-head is completely absent: doubtless competition 
and an increased death rate of eggs must occur if the number of eggs 
per flower-head is substantially increased, since a flower-head cannot 
hold a gall with more than about 15 gall-cells, but in samples of the 
size taken in 1935 and 1936 the effect lay within the limits of error. 
This re-analysis modifies Varley’s previous conclusions (loc. cit. p. 
175), for the available information is in fact insufficient to demonstrate 
an increase of larval mortality with increasing size of egg batch. It 
would indeed be interesting to know more about the sensitivity of the 
tests used here, and in particular to know how large an effect of compe- 
tition could escape detection in this number of observations, but that 
appears to involve a much more difficult analysis. One flaw in both 
the present analysis and that used previously by Varley is that the 
x’ tests relate to any form of heterogencity in the mortality rate for 
different numbers of eggs per flower-head. The real interest. lies, 
however, in the possibility that the mortality rate shows a regular trend 
as x increases; one would therefore like to be able to isolate a single 
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degree of freedom in the appropriate x’ that would represent a regression 
of mortality rate on x, but unfortunately there appear to be substantial 
difficulties in thus complicating the analysis. The values of x’ in 
Table 2 are in fact so small as to leave little hope that such a component 
would be statistically significant, and inspection of the observed and 
expected frequencies supports this opinion. Nothing can be said about 
egg mortality at higher densities, but within the range of the numbers 
per flower-head observed an average of about 30% or slightly less seems 
appropriate for both years. 

It may be objected that the validity of the estimate rests upon the 
assumption of a Poisson distribution for the egg frequencies, an as- 
sumption for which there is no theoretical basis. Analogous assump- 
tions are of course implicit in many techniques for estimating a popula- 
tion characteristic from a sample, and the justification lies in the choice 
of a technique that makes the estimate relatively insensitive to the 
exact terms of the assumption. That no significant deviation from the 
Poisson is found does not prove that the distribution was Poisson, any 
more than the comparison of the egg and gall-cell frequencies has 
proved that the mortality is constant. The Poisson distribution is 
introduced as a convenient simple model that is not contradicted by 
the data; since the purpose of the analysis is only to examine the ratio 
of comparable means for the two frequency distributions, the precise 
algebraic formulation used is not very important. Any attempt to 
examine the mortality between oviposition and gall-cell formation 
without specification of a model for the egg distribution is doomed to 
failure because each egg frequency must be represented by a separate 
parameter. Varley’s analysis in fact assumed that the true egg dis- 
tribution was exactly proportional to that observed, though one would 
scarcely seriously maintain that in 1935 nearly 0.7% of flower-heads 
had exactly 12 eggs while none had 10, 11, or 13!. All that is required 
for the estimation of the mortality rate is an estimate of the number of 
eggs per flower-head in a particular group of flowers and an estimate 
of the number of gall-cells per flower-head in the same or an exactly 
corresponding group of flowers: the whole difficulty lies with the empty 
flower-heads, since some that contain eggs will be empty in respect of 
gall-cells. The Poisson distribution involves only one new parameter, 
and is about the simplest assumption possible on which eggs and gall- 
cells can be averaged over comparable groups of flowers; it has been 
shown to fit the data excellently in both years. Since the hypothesis 
of a Poisson distribution and a constant mortality is not contradicted, 
that of a constant mortality without specification of distribution is 
tenable under the conditions studied. An alternative might give very 
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different estimates of the mean numbers of eggs and gall-cells, because of 
a different relationship between the (unobservable) zero class and the 
others, but, if it were to fit the non-zero observations equally well, it 
would almost certainly give much the same ratio of the means. If an 
alternative such as the negative binomial (Sampford, 1954) were em- 
ployed, however, the information provided by the sample would have to 
be used in estimating a larger number of parameters,* so that larger 
standard errors and a less sensitive test of the hypothesis of random 
mortality would probably be obtained. 


6. Summary 


Records of the frequency distribution of eggs and gall-cells of the 
knapweed gall-fly in flower-heads of black knapweed have been used 
in illustration of the use of the truncated Poisson distribution for 
representing biological observations. If mortality between oviposition 
and gall-cell formation were entirely random and did not depend upon 
the density of eggs in a flower-head, a truncated Poisson distribution 
of eggs would lead to a similar distribution of cells. The analysis 
shows that, in both years of recording, the observations on eggs and 
cells were excellently described by distributions of this type, so that 
no evidence against random mortality appears. It is unlikely that 
any different conclusion would be reached by using any alternative 
formulation of the egg distribution. However, nothing is known of the 
power of the test for detecting deviations from random mortality. 

For both the 1935 and the 1936 data, the estimated mortality rate is 
about 30%, but the samples are not large enough to determine this 
very exactly: even if the information from the two years is pooled, a 
standard error of about 33% must be attached to this estimate. 

*In addition, new technical difficulties might be introduced by the cell distribution assuming a 


much more complicated form than the egg distribution (though this does not occur for the negative 
binomial). 
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QUERIES 


Greorce W. SNeEvEcoR, Editor 


QUERY: [I have an experiment with several lots with different 
117. =numbers of animals. There is strong evidence (including Bart- 

lett’s test) that the variances in the lots are different. The experi- 
ment is similar to that in Snedecor’s text (4th edition) section 10.8. He 
does not give a test for the means if the variances differ. I should think 
some kind of weighted analysis of variance could be used. Is there any 
method available for making the analysis? 


As there is apparently strong evidence that heterogeneity 

ANSWER: of variance is present among the lots it seems advisable, as 

the inquiry suggests, to use a test which would allow for 

such heterogeneity rather than the ordinary analysis of variance test 

which is based on the assumption of a common variance for all the lots. 

It is well-known that the usual F-statistic for testing the null hy- 
pothesis of equality of lot means is 


k 
— £)° 


n—k 
= k 


j=1 


which has, under the assumption of common variance and the null 
hypothesis, an F distribution with degrees of freedom k — 1, n — k. In 
the experiment referred to in Snedecor’s book (see Table 10.12), k = 8, 
n = 56; the lot sizes n; and the lot means #; are also given there. It 
turns out that V, = 2.97 which is just short of the 1% point 3.04 but 
well above the 5% point 2.21. 

In a recent paper Box [1] has investigated the distortion in the dis- 
tribution of V, when the assumption of equality of lot variances is 
violated. In Table 4 (loc. cit), for instance, he notes that for certain 
specified lot sizes and lot variances the true level of a 5% F-test which 
employs V, may be much less or much greater than 5°%, depending on 
the particular lot sizes and variances. He also suggests as a working 
approximation to the distribution of V, under the null hypothesis of 
equality of means to regard V,/b as having an F-distribution with de- 
grees of freedom h’, h where 


395 


\ 
= 


396 BIOMETRICS, SEPTEMBER 1955 


n—k ~ ape 


i=1 


~ nk — 1 > 


(n — 


i=1 


nai} +n (n — 2n,)o% 


t=1 


b 


h’ = 


i=1 


— loi 


i=1 


(n; — Dei} 
h= 


Here o; is the true variance of the normal distribution corresponding to 
the ith lot. It is, of course, presumptuous to apply the above approxima- 
tion with the sample variance s; replacing o; . If it were applied to the 
data of Table 10.8 it would be impaired even further by the fact that 
the lot sizes are small (4 to 10). The technique is stated here, however, 
in the belief an experimenter may find it useful if he has a sample which 
involves large lot sizes. 
From the data it is found that 


A 
b 


By interpolation in the F-tables it can be seen that the probability of 
exceeding the value 3.32 is approximately 3%. Application of the ordi- 
nary F test (ignoring inequality of variances) yielded a value just short of 
the 1% point; however the approximation is too rough here to certify a 
contradiction. 

An apparently more promising method of coping with inequality of 
lot variances is to use the statistic 


b = 0.8952, h’=4.5, h = 27.7, = 3.32. 


where p= 7 


and = 72 
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Under the null hypothesis of equality of means, V, is distributed approxi- 
mately as x” with k — 1 degrees of freedom for large values of n;. A 
rationale for using V, is that when a; replaces s; , the test is equivalent 
to the likelihood ratio test for equality of means when the variances are 
known (and need not be equal). 

Welch [2] has improved the x’ approximation to V, by dividing it by 
a correction factor. Specifically, he recommends using the statistic 


1+ 


mn; —1 
w; 


i=l 
as if it were distributed as F with degrees of freedom f, , f. , where 
= k 1 


For the data of table 10.8 it turns out that 
p=289, f,=7, f,=166, V; = 4.53 


By linear interpolation in the F-tables it is seen that the 1% and 3% 
points are approximately 4.00 and 4.66 respectively. It will be noted 
that the value of V; falls short of the 3% point. It should also be pointed 
out that due to the method by which Welch developed this approximate 
distribution he can state only that it holds to order 1/(n; — 1). Since 
the lot sizes in the data range from 4 to 10 it is apparent that the ap- 
proximation is too rough to be trustworthy for the example considered. 

James [3] has also developed an approximation to the distribution of 
V, which holds to order 1/(n; — 1). Specifically, for a given level of 


significance P, say, he finds a function h(w, , w. , --- , w,) such that 
P{V. > h(w, , we, ,w.)} =P 

For an approximation to the order 1/(n; — 1) he recommends setting 

h(w, , We, We) 


+k+1 1 W; ‘ 


where x» is the value of x’ with k — 1 degrees of freedom which is ex- 
ceeded with probability P. As pointed out above, the small lot sizes of 
the example cause such an approximation to be rough. The value of h 
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computed for the data turns out to be 34.49 when P is .005. Since the 
computed value of V, is 39.36, a test based on this approximate dis- 
tribution would recommend rejection of the null hypothesis at the 4% 
level. As a consequence of the small lot sizes it is not at all surprising 
that this contradicts the decision not to reject at the }% level based on 
Welch’s approximation. 

It may also be remarked that James (loc cit.) has developed a more 
refined approximation to the distribution of V. which is of order 
(1/(n; — 1))*, but the computations involved are so exceedingly tedious 
as to discourage its use. 

In conclusion, this author’s answer to the query as to whether there 
is available some kind of weighting technique for analyzing an experi- 
ment such as that of Table 10.12 in which heterogeneity of variance is 
present, would be a qualified yes. Yes if the lot sizes are not small, in 
which case one could use the results of Welch or James. What is meant 
here by “not small’’ cannot, at this stage of available results, be cate- 
gorically defined since it is merely the order of the approximation which 
is 1/(n; — 1). Thus, for instance, if all the lots sizes exceed 10 the error 
which results in using the given approximation would be of an order 
not exceeding .01. 

A final word of warning might be added here. Since distortions due 
to inequality of variances alter the effective level of significance of the 
ordinary analysis of variance test, a policy of using the F-test and ignor- 
ing variance heterogeneity may fortuitously lead to a correct decision in 
some circumstances. For instance, if the lot sizes and variances are such 
that the effective level of 5% F-test is increased to 10 or 15%, there will 
tend to be an increase of power; consequently, an ordinary F-test, as a 
result of such distortion would tend to detect the falsehood of the null 
hypothesis more frequently than it would without distortion. However, 
since the effective level might also decrease as a result of distortion, a 
policy of ignoring variance heterogeneity is self-contradictory. 
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